Non-default streams for filling matrix #32

Tanvi141 · 2020-07-22T09:59:13Z

References to other Issues or PRs or Relevant literature

Fixes #2

Brief description of what is fixed or changed

Implementing filling matrix in adaboost::cuda::core by using non default streams. Number of streams is passed as a parameter to the function, and each row of matrix gets filled by one of the streams. The stream to fill is chosen in a round robin fashion.

Other comments

Initial code of filling using n streams was by @fiza11. @Tanvi141 worked on integrating that code into this code base as well as implementing round robin.

Tanvi141 · 2020-07-22T10:01:02Z

Commenting build and test reports here.

-- The CXX compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-10.2/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tanvi/OpenSource/AdaBoost/build-adaboost
Scanning dependencies of target adaboost_utils
Scanning dependencies of target adaboost_cuda_wrappers
Scanning dependencies of target adaboost_core
Scanning dependencies of target adaboost_cuda
[  5%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda_wrappers.dir/cuda/utils/cuda_wrappers_impl.cu.o
[ 11%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/core/data_structures_impl.cpp.o
[ 16%] Building CXX object adaboost/CMakeFiles/adaboost_utils.dir/utils/utils_impl.cpp.o
[ 22%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/core/operations_impl.cpp.o
[ 27%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/core/cuda_data_structures_impl.cu.o
[ 33%] Linking CXX shared library ../libs/libadaboost_utils.so
[ 38%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/core/operations_impl.cu.o
[ 38%] Built target adaboost_utils
[ 44%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/utils/cuda_wrappers_impl.cu.o
[ 50%] Linking CUDA device code CMakeFiles/adaboost_cuda_wrappers.dir/cmake_device_link.o
[ 55%] Linking CUDA shared library ../libs/libadaboost_cuda_wrappers.so
[ 55%] Built target adaboost_cuda_wrappers
[ 61%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/utils/utils_impl.cpp.o
/home/tanvi/OpenSource/AdaBoost/adaboost/adaboost/cuda/core/operations_impl.cu(96): warning: 'long double' is treated as 'double' in device code

/home/tanvi/OpenSource/AdaBoost/adaboost/adaboost/cuda/core/operations_impl.cu(215): warning: 'long double' is treated as 'double' in device code

Warning: 'long double' is treated as 'double' in device code

Warning: 'long double' is treated as 'double' in device code

[ 66%] Linking CXX shared library ../libs/libadaboost_core.so
[ 66%] Built target adaboost_core
Scanning dependencies of target test_core
[ 72%] Building CXX object adaboost/CMakeFiles/test_core.dir/tests/test_core.cpp.o
[ 77%] Linking CXX executable ../bin/test_core
[ 77%] Built target test_core
[ 83%] Linking CUDA device code CMakeFiles/adaboost_cuda.dir/cmake_device_link.o
[ 88%] Linking CUDA shared library ../libs/libadaboost_cuda.so
[ 88%] Built target adaboost_cuda
Scanning dependencies of target test_cuda
[ 94%] Building CXX object adaboost/CMakeFiles/test_cuda.dir/tests/test_cuda.cpp.o
[100%] Linking CXX executable ../bin/test_cuda
[100%] Built target test_cuda
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from Core
[ RUN      ] Core.Vector
[       OK ] Core.Vector (0 ms)
[ RUN      ] Core.Matrices
[       OK ] Core.Matrices (0 ms)
[ RUN      ] Core.Sum
[       OK ] Core.Sum (0 ms)
[ RUN      ] Core.Argmax
[       OK ] Core.Argmax (0 ms)
[----------] 4 tests from Core (0 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 4 tests.
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Cuda
[ RUN      ] Cuda.VectorGPU
[       OK ] Cuda.VectorGPU (51 ms)
[ RUN      ] Cuda.MatrixGPU
[       OK ] Cuda.MatrixGPU (41 ms)
[ RUN      ] Cuda.MatricesGPU
[       OK ] Cuda.MatricesGPU (311 ms)
[----------] 3 tests from Cuda (403 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (403 ms total)
[  PASSED  ] 3 tests.

czgdp1807 · 2020-07-22T16:17:40Z

adaboost/cuda/core/operations.hpp

@@ -34,8 +34,7 @@ namespace adaboost
            */

            template <class data_type_matrix>
-            void fill(const data_type_matrix value, const MatrixGPU<data_type_matrix>&mat, unsigned block_size_x, unsigned block_size_y);


Why is this deleted?

Since in current function, we are only taking one parameter, it number of streams. Instead of taking block_size_x and block_size_y. So the API is different now.

Please do not deprecate APIs until utmost necessary. Please keep both APIs, we will use whatever required while implementing the AdaBoost algorithm.
Just copy the original fill and related tests function from master branch and add it to the right places to avoid conflicts. Do not modify the changes in your patch, just pick the right code from master and paste it in your branch.

Done, please have a look

adaboost/cuda/core/operations_impl.cu

czgdp1807 · 2020-07-22T16:19:17Z

Please doc strings as well. See the existing code for documentation style and similar docs for new functions.

Co-authored-by: Gagandeep Singh <[email protected]>

Tanvi141 · 2020-07-24T12:45:36Z

@czgdp1807, is this ready to merge?

czgdp1807 · 2020-07-29T16:13:11Z

[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from Cuda
[ RUN      ] Cuda.VectorGPU
[       OK ] Cuda.VectorGPU (50 ms)
[ RUN      ] Cuda.MatrixGPU
[       OK ] Cuda.MatrixGPU (3323 ms)
[ RUN      ] Cuda.MatricesGPU
[       OK ] Cuda.MatricesGPU (766 ms)
[----------] 3 tests from Cuda (4139 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (4139 ms total)
[  PASSED  ] 3 tests.
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from Core
[ RUN      ] Core.Vector
[       OK ] Core.Vector (1 ms)
[ RUN      ] Core.Matrices
[       OK ] Core.Matrices (0 ms)
[ RUN      ] Core.Sum
[       OK ] Core.Sum (0 ms)
[ RUN      ] Core.Argmax
[       OK ] Core.Argmax (0 ms)
[----------] 4 tests from Core (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (1 ms total)
[  PASSED  ] 4 tests.

czgdp1807 · 2020-07-29T16:14:29Z

Please use https://github.com/codezonediitj/utils/blob/master/create_template.py for creating template instantiations for function prototypes automatically.

Tanvi added 3 commits July 21, 2020 20:23

saving

406b7c9

Streams without round robin by Fiza

63cdb34

Round robin choosing streams to fill matrix

84e5dec

czgdp1807 reviewed Jul 22, 2020

View reviewed changes

adaboost/cuda/core/operations_impl.cu Outdated Show resolved Hide resolved

czgdp1807 reviewed Jul 22, 2020

View reviewed changes

adaboost/cuda/core/operations_impl.cu Outdated Show resolved Hide resolved

Tanvi141 and others added 3 commits July 23, 2020 10:59

Update adaboost/cuda/core/operations_impl.cu

856b51e

Co-authored-by: Gagandeep Singh <[email protected]>

Docstrings and boundary checks

c8eacfa

creating wrappers for CUDA APIs

477dd87

Tanvi added 3 commits July 26, 2020 11:29

Addiing the old function in operations

ec23469

adding the declaration from master

76e5506

Testing both types of fills

40c0115

czgdp1807 added cuda operations labels Jul 26, 2020

Tanvi141 requested a review from czgdp1807 July 26, 2020 17:06

Code cleaning and better tests

a691cd7

czgdp1807 merged commit 1f083c7 into codezonediitj:master Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-default streams for filling matrix #32

Non-default streams for filling matrix #32

Uh oh!

Tanvi141 commented Jul 22, 2020

Uh oh!

Tanvi141 commented Jul 22, 2020

Uh oh!

czgdp1807 Jul 22, 2020

Uh oh!

Tanvi141 Jul 23, 2020

Uh oh!

czgdp1807 Jul 25, 2020

Uh oh!

Tanvi141 Jul 26, 2020

Uh oh!

Uh oh!

Uh oh!

czgdp1807 commented Jul 22, 2020

Uh oh!

Tanvi141 commented Jul 24, 2020

Uh oh!

czgdp1807 commented Jul 29, 2020

Uh oh!

czgdp1807 commented Jul 29, 2020

Uh oh!

Uh oh!

Non-default streams for filling matrix #32

Non-default streams for filling matrix #32

Uh oh!

Conversation

Tanvi141 commented Jul 22, 2020

References to other Issues or PRs or Relevant literature

Brief description of what is fixed or changed

Other comments

Uh oh!

Tanvi141 commented Jul 22, 2020

Uh oh!

czgdp1807 Jul 22, 2020

Choose a reason for hiding this comment

Uh oh!

Tanvi141 Jul 23, 2020

Choose a reason for hiding this comment

Uh oh!

czgdp1807 Jul 25, 2020

Choose a reason for hiding this comment

Uh oh!

Tanvi141 Jul 26, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

czgdp1807 commented Jul 22, 2020

Uh oh!

Tanvi141 commented Jul 24, 2020

Uh oh!

czgdp1807 commented Jul 29, 2020

Uh oh!

czgdp1807 commented Jul 29, 2020

Uh oh!

Uh oh!