Skip to content

Non-default streams for filling matrix #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 29, 2020

Conversation

Tanvi141
Copy link
Contributor

References to other Issues or PRs or Relevant literature

Fixes #2

Brief description of what is fixed or changed

Implementing filling matrix in adaboost::cuda::core by using non default streams. Number of streams is passed as a parameter to the function, and each row of matrix gets filled by one of the streams. The stream to fill is chosen in a round robin fashion.

Other comments

Initial code of filling using n streams was by @fiza11. @Tanvi141 worked on integrating that code into this code base as well as implementing round robin.

@Tanvi141
Copy link
Contributor Author

Commenting build and test reports here.

-- The CXX compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-10.2/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tanvi/OpenSource/AdaBoost/build-adaboost
Scanning dependencies of target adaboost_utils
Scanning dependencies of target adaboost_cuda_wrappers
Scanning dependencies of target adaboost_core
Scanning dependencies of target adaboost_cuda
[  5%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda_wrappers.dir/cuda/utils/cuda_wrappers_impl.cu.o
[ 11%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/core/data_structures_impl.cpp.o
[ 16%] Building CXX object adaboost/CMakeFiles/adaboost_utils.dir/utils/utils_impl.cpp.o
[ 22%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/core/operations_impl.cpp.o
[ 27%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/core/cuda_data_structures_impl.cu.o
[ 33%] Linking CXX shared library ../libs/libadaboost_utils.so
[ 38%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/core/operations_impl.cu.o
[ 38%] Built target adaboost_utils
[ 44%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/utils/cuda_wrappers_impl.cu.o
[ 50%] Linking CUDA device code CMakeFiles/adaboost_cuda_wrappers.dir/cmake_device_link.o
[ 55%] Linking CUDA shared library ../libs/libadaboost_cuda_wrappers.so
[ 55%] Built target adaboost_cuda_wrappers
[ 61%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/utils/utils_impl.cpp.o
/home/tanvi/OpenSource/AdaBoost/adaboost/adaboost/cuda/core/operations_impl.cu(96): warning: 'long double' is treated as 'double' in device code

/home/tanvi/OpenSource/AdaBoost/adaboost/adaboost/cuda/core/operations_impl.cu(215): warning: 'long double' is treated as 'double' in device code

Warning: 'long double' is treated as 'double' in device code

Warning: 'long double' is treated as 'double' in device code

[ 66%] Linking CXX shared library ../libs/libadaboost_core.so
[ 66%] Built target adaboost_core
Scanning dependencies of target test_core
[ 72%] Building CXX object adaboost/CMakeFiles/test_core.dir/tests/test_core.cpp.o
[ 77%] Linking CXX executable ../bin/test_core
[ 77%] Built target test_core
[ 83%] Linking CUDA device code CMakeFiles/adaboost_cuda.dir/cmake_device_link.o
[ 88%] Linking CUDA shared library ../libs/libadaboost_cuda.so
[ 88%] Built target adaboost_cuda
Scanning dependencies of target test_cuda
[ 94%] Building CXX object adaboost/CMakeFiles/test_cuda.dir/tests/test_cuda.cpp.o
[100%] Linking CXX executable ../bin/test_cuda
[100%] Built target test_cuda
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from Core
[ RUN      ] Core.Vector
[       OK ] Core.Vector (0 ms)
[ RUN      ] Core.Matrices
[       OK ] Core.Matrices (0 ms)
[ RUN      ] Core.Sum
[       OK ] Core.Sum (0 ms)
[ RUN      ] Core.Argmax
[       OK ] Core.Argmax (0 ms)
[----------] 4 tests from Core (0 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 4 tests.
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Cuda
[ RUN      ] Cuda.VectorGPU
[       OK ] Cuda.VectorGPU (51 ms)
[ RUN      ] Cuda.MatrixGPU
[       OK ] Cuda.MatrixGPU (41 ms)
[ RUN      ] Cuda.MatricesGPU
[       OK ] Cuda.MatricesGPU (311 ms)
[----------] 3 tests from Cuda (403 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (403 ms total)
[  PASSED  ] 3 tests.

@@ -34,8 +34,7 @@ namespace adaboost
*/

template <class data_type_matrix>
void fill(const data_type_matrix value, const MatrixGPU<data_type_matrix>&mat, unsigned block_size_x, unsigned block_size_y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in current function, we are only taking one parameter, it number of streams. Instead of taking block_size_x and block_size_y. So the API is different now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not deprecate APIs until utmost necessary. Please keep both APIs, we will use whatever required while implementing the AdaBoost algorithm.
Just copy the original fill and related tests function from master branch and add it to the right places to avoid conflicts. Do not modify the changes in your patch, just pick the right code from master and paste it in your branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please have a look

@czgdp1807
Copy link
Member

Please doc strings as well. See the existing code for documentation style and similar docs for new functions.

@Tanvi141
Copy link
Contributor Author

@czgdp1807, is this ready to merge?

@czgdp1807
Copy link
Member

[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from Cuda
[ RUN      ] Cuda.VectorGPU
[       OK ] Cuda.VectorGPU (50 ms)
[ RUN      ] Cuda.MatrixGPU
[       OK ] Cuda.MatrixGPU (3323 ms)
[ RUN      ] Cuda.MatricesGPU
[       OK ] Cuda.MatricesGPU (766 ms)
[----------] 3 tests from Cuda (4139 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (4139 ms total)
[  PASSED  ] 3 tests.
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from Core
[ RUN      ] Core.Vector
[       OK ] Core.Vector (1 ms)
[ RUN      ] Core.Matrices
[       OK ] Core.Matrices (0 ms)
[ RUN      ] Core.Sum
[       OK ] Core.Sum (0 ms)
[ RUN      ] Core.Argmax
[       OK ] Core.Argmax (0 ms)
[----------] 4 tests from Core (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (1 ms total)
[  PASSED  ] 4 tests.

@czgdp1807 czgdp1807 merged commit 1f083c7 into codezonediitj:master Jul 29, 2020
@czgdp1807
Copy link
Member

Please use https://github.com/codezonediitj/utils/blob/master/create_template.py for creating template instantiations for function prototypes automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using non-default streams in CUDA
2 participants