Skip to content

Commit 23fc593

Browse files
author
Dan
committed
Improved thrust interop tutorial.
1 parent 09d392f commit 23fc593

File tree

4 files changed

+104
-47
lines changed

4 files changed

+104
-47
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,5 @@ bin/
2121
CMakeCache.txt
2222
*.suo
2323
*.log
24-
*.tlog
24+
*.tlog
25+
build
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
Using a cv::cuda::GpuMat with thrust
2+
===========================================
3+
4+
Goal
5+
----
6+
7+
Thrust is an extremely powerful library for various cuda accelerated algorithms. However thrust is designed
8+
to work with vectors and not pitched matricies. The following tutorial will discuss wrapping cv::cuda::GpuMat's
9+
into thrust iterators that can be used with thrust algorithms.
10+
11+
This tutorial should show you how to:
12+
- Wrap a GpuMat into a thrust iterator
13+
- Fill a GpuMat with random numbers
14+
- Sort a column of a GpuMat in place
15+
- Copy values greater than 0 to a new gpu matrix
16+
- Use streams with thrust
17+
18+
Wrapping a GpuMat into a thrust iterator
19+
----
20+
21+
The following code will produce an iterator for a GpuMat
22+
23+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/Thrust_interop.hpp begin_itr
24+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/Thrust_interop.hpp end_itr
25+
26+
Our goal is to have an iterator that will start at the beginning of the matrix, and increment correctly to access continuous matrix elements. This is trivial for a continuous row, but how about for a column
27+
of a pitched matrix? To do this we need the iterator to be aware of the matrix dimensions and step. This information is embedded in the step_functor.
28+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/Thrust_interop.hpp step_functor
29+
The step functor takes in an index value and returns the appropriate
30+
offset from the beginning of the matrix. The counting iterator simply increments over the range of pixel elements. Combined into the transform_iterator we have an iterator that counts from 0 to M*N and correctly
31+
increments to account for the pitched memory of a GpuMat. Unfortunately this does not include any memory location information, for that we need a thrust::device_ptr. By combining a device pointer with the
32+
transform_iterator we can point thrust to the first element of our matrix and have it step accordingly.
33+
34+
Fill a GpuMat with random numbers
35+
----
36+
Now that we have some nice functions for making iterators for thrust, lets use them to do some things OpenCV can't do. Unfortunately at the time of this writing, OpenCV doesn't have any Gpu random number generation.
37+
Thankfully thrust does and it's now trivial to interop between the two.
38+
Example taken from http://stackoverflow.com/questions/12614164/generating-a-random-number-vector-between-0-and-1-0-using-thrust
39+
40+
First we need to write a functor that will produce our random values.
41+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu prg
42+
43+
This will take in an integer value and output a value between a and b.
44+
Now we will populate our matrix with values between 0 and 10 with a thrust transform.
45+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu random
46+
47+
Sort a column of a GpuMat in place
48+
----
49+
50+
Lets fill matrix elements with random values and an index. Afterwards we will sort the random numbers and the indecies.
51+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu sort
52+
53+
Copy values greater than 0 to a new gpu matrix while using streams
54+
----
55+
In this example we're going to see how cv::cuda::Streams can be used with thrust. Unfortunately this specific example uses functions that must return
56+
results to the CPU so it isn't the optimal use of streams.
57+
58+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu copy_greater
59+
60+
61+
First we will populate a GPU mat with randomly generated data between -1 and 1 on a stream.
62+
63+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu random_gen_stream
64+
65+
Notice the use of thrust::system::cuda::par.on(...), this creates an execution policy for executing thrust code on a stream.
66+
There is a bug in the version of thrust distributed with the cuda toolkit, as of version 7.5 this has not been fixed. This bug causes code to not execute on streams.
67+
The bug can however be fixed by using the newest version of thrust from the git repository. (http://github.com/thrust/thrust.git)
68+
Next we will determine how many values are greater than 0 by using thrust::count_if with the following predicate:
69+
70+
@snippet samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu pred_greater
71+
72+
We will use those results to create an output buffer for storing the copied values, we will then use copy_if with the same predicate to populate the output buffer.
73+
Lastly we will download the values into a CPU mat for viewing.

samples/cpp/tutorial_code/gpu/gpu-thrust-interop/Thrust_interop.hpp

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,10 @@
66
#include <thrust/iterator/counting_iterator.h>
77
#include <thrust/device_ptr.h>
88

9-
template<typename T> struct
10-
CV_TYPE
11-
{
12-
static const int DEPTH;
13-
};
14-
15-
template<> static const int CV_TYPE<float>::DEPTH = CV_32F;
16-
template<> static const int CV_TYPE<double>::DEPTH = CV_64F;
17-
template<> static const int CV_TYPE<int>::DEPTH = CV_32S;
18-
template<> static const int CV_TYPE<uchar>::DEPTH = CV_8U;
19-
template<> static const int CV_TYPE<char>::DEPTH = CV_8S;
20-
template<> static const int CV_TYPE<ushort>::DEPTH = CV_16U;
21-
template<> static const int CV_TYPE<short>::DEPTH = CV_16S;
22-
9+
/*
10+
@Brief step_functor is an object to correctly step a thrust iterator according to the stride of a matrix
11+
*/
12+
//! [step_functor]
2313
template<typename T> struct step_functor : public thrust::unary_function<int, int>
2414
{
2515
int columns;
@@ -41,7 +31,8 @@ template<typename T> struct step_functor : public thrust::unary_function<int, in
4131
return idx;
4232
}
4333
};
44-
34+
//! [step_functor]
35+
//! [begin_itr]
4536
/*
4637
@Brief GpuMatBeginItr returns a thrust compatible iterator to the beginning of a GPU mat's memory.
4738
@Param mat is the input matrix
@@ -52,11 +43,13 @@ thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<s
5243
{
5344
if (channel == -1)
5445
mat = mat.reshape(1);
55-
CV_Assert(mat.depth() == CV_TYPE<T>::DEPTH);
46+
CV_Assert(mat.depth() == cv::DataType<T>::depth);
5647
CV_Assert(channel < mat.channels());
5748
return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0) + channel),
5849
thrust::make_transform_iterator(thrust::make_counting_iterator(0), step_functor<T>(mat.cols, mat.step / sizeof(T), mat.channels())));
5950
}
51+
//! [begin_itr]
52+
//! [end_itr]
6053
/*
6154
@Brief GpuMatEndItr returns a thrust compatible iterator to the end of a GPU mat's memory.
6255
@Param mat is the input matrix
@@ -67,8 +60,11 @@ thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<s
6760
{
6861
if (channel == -1)
6962
mat = mat.reshape(1);
70-
CV_Assert(mat.depth() == CV_TYPE<T>::DEPTH);
63+
CV_Assert(mat.depth() == cv::DataType<T>::depth);
7164
CV_Assert(channel < mat.channels());
7265
return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0) + channel),
7366
thrust::make_transform_iterator(thrust::make_counting_iterator(mat.rows*mat.cols), step_functor<T>(mat.cols, mat.step / sizeof(T), mat.channels())));
74-
}
67+
}
68+
//! [end_itr]
69+
70+

samples/cpp/tutorial_code/gpu/gpu-thrust-interop/main.cu

Lines changed: 15 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
#include <thrust/random.h>
66
#include <thrust/sort.h>
77
#include <thrust/system/cuda/execution_policy.h>
8+
//! [prg]
89
struct prg
910
{
1011
float a, b;
@@ -22,36 +23,10 @@ struct prg
2223
return dist(rng);
2324
}
2425
};
26+
//! [prg]
2527

26-
template<typename T> struct pred_eq
27-
{
28-
T value;
29-
int channel;
30-
__host__ __device__
31-
pred_eq(T value_, int channel_ = 0) :value(value_), channel(channel_){}
32-
__host__ __device__
33-
bool operator()(const T val) const
34-
{
35-
return val == value;
36-
}
37-
template<int N>
38-
__host__ __device__ bool operator()(const cv::Vec<T, N>& val)
39-
{
40-
if (channel < N)
41-
return val.val[channel] == value;
42-
return false;
43-
}
4428

45-
__host__ __device__ bool operator()( const thrust::tuple<T, T, T>& val)
46-
{
47-
if (channel == 0)
48-
return thrust::get<0>(val) == value;
49-
if (channel == 1)
50-
return thrust::get<1>(val) == value;
51-
if (channel == 2)
52-
return thrust::get<2>(val) == value;
53-
}
54-
};
29+
//! [pred_greater]
5530
template<typename T> struct pred_greater
5631
{
5732
T value;
@@ -61,12 +36,14 @@ template<typename T> struct pred_greater
6136
return val > value;
6237
}
6338
};
39+
//! [pred_greater]
6440

6541

6642
int main(void)
6743
{
6844
// Generate a 2 channel row matrix with 100 elements. Set the first channel to be the element index, and the second to be a randomly
6945
// generated value. Sort by the randomly generated value while maintaining index association.
46+
//! [sort]
7047
{
7148
cv::cuda::GpuMat d_idx(1, 100, CV_32SC2);
7249

@@ -82,8 +59,10 @@ int main(void)
8259

8360
cv::Mat h_idx(d_idx);
8461
}
62+
//! [sort]
8563

8664
// Randomly fill a row matrix with 100 elements between -1 and 1
65+
//! [random]
8766
{
8867
cv::cuda::GpuMat d_value(1, 100, CV_32F);
8968
auto valueBegin = GpuMatBeginItr<float>(d_value);
@@ -92,27 +71,35 @@ int main(void)
9271

9372
cv::Mat h_value(d_value);
9473
}
74+
//! [random]
9575

9676
// OpenCV has count non zero, but what if you want to count a specific value?
77+
//! [count_value]
9778
{
9879
cv::cuda::GpuMat d_value(1, 100, CV_32S);
9980
d_value.setTo(cv::Scalar(0));
10081
d_value.colRange(10, 50).setTo(cv::Scalar(15));
10182
auto count = thrust::count(GpuMatBeginItr<int>(d_value), GpuMatEndItr<int>(d_value), 15);
10283
std::cout << count << std::endl;
10384
}
85+
//! [count_value]
86+
10487
// Randomly fill an array then copy only values greater than 0. Perform these tasks on a stream.
88+
//! [copy_greater]
10589
{
10690
cv::cuda::GpuMat d_value(1, 100, CV_32F);
10791
auto valueBegin = GpuMatBeginItr<float>(d_value);
10892
auto valueEnd = GpuMatEndItr<float>(d_value);
10993
cv::cuda::Stream stream;
94+
//! [random_gen_stream]
11095
thrust::transform(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_value.cols), valueBegin, prg(-1, 1));
96+
//! [random_gen_stream]
11197
int count = thrust::count_if(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), valueBegin, valueEnd, pred_greater<float>(0.0));
11298
cv::cuda::GpuMat d_valueGreater(1, count, CV_32F);
11399
thrust::copy_if(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), valueBegin, valueEnd, GpuMatBeginItr<float>(d_valueGreater), pred_greater<float>(0.0));
114100
cv::Mat h_greater(d_valueGreater);
115101
}
102+
//! [copy_greater]
116103

117104
return 0;
118105
}

0 commit comments

Comments
 (0)