Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6400b07
add development dependencies
simsurace Mar 26, 2021
68fa420
replace GPUArrays.gpu_rand by rand(Float32) , remove rng arguments
simsurace Mar 26, 2021
377a39a
add compat entries
simsurace Mar 26, 2021
837182d
fix issue #3: more robust indexing
simsurace Mar 26, 2021
64c037f
Revert "fix issue #3: more robust indexing"
simsurace Mar 26, 2021
56cc07e
Merge branch 'master' into new_rand
simsurace Mar 26, 2021
9a6b020
remove spurious rng argument
simsurace Mar 26, 2021
021b623
remove randstates argument
simsurace Mar 26, 2021
1c97e8a
remove support for Julia 1.5
simsurace Mar 26, 2021
04ba94a
remove GPUArrays dep, add manifest
simsurace Mar 26, 2021
7640366
add GPUCompiler dep by hand
simsurace Mar 26, 2021
fed8c64
Merge pull request #9 from JuliaGPU/new_rand
simsurace Mar 26, 2021
8679856
tweaks, go to version 0.2.4
simsurace Mar 26, 2021
35905ee
fix overwrite message; wrong type constraint
simsurace Mar 31, 2021
4cd4cde
fix dependencies
simsurace Mar 31, 2021
7ada966
add comment about CUDA
simsurace Mar 31, 2021
c5185ab
Update README.md
simsurace Mar 31, 2021
9df8b29
Update README.md
simsurace Mar 31, 2021
014af15
formatting
simsurace Mar 31, 2021
a7dde9d
clarify readme
simsurace Apr 1, 2021
989eec3
update
simsurace Apr 1, 2021
dcc40da
first try with CUDA v3, broken!
simsurace Apr 12, 2021
866c69d
try out naive kernel without deadlock
simsurace Apr 13, 2021
4515a70
fix typo
simsurace Apr 13, 2021
7953f17
remove manifest
simsurace Apr 28, 2021
eb1df58
ignore manifest
simsurace Apr 28, 2021
6242d0e
float32 cosmetics
simsurace Apr 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 0 additions & 20 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,6 @@
steps:
# Julia versions

- label: "Julia 1.5, CUDA 11.2"
plugins:
- JuliaCI/julia#v1:
version: 1.5
- JuliaCI/julia-test#v1:
test_args: "--thorough"
- JuliaCI/julia-coverage#v1:
codecov: true
dirs:
- src
agents:
queue: "juliagpu"
cuda: "11.2"
cap: "recent"
env:
JULIA_CUDA_VERSION: '11.2'
JULIA_CUDA_USE_BINARYBUILDER: 'true'
if: build.message !~ /\[skip tests\]/
timeout_in_minutes: 120

- label: "Julia 1.6, CUDA 11.2"
plugins:
- JuliaCI/julia#v1:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*.jl.*.cov
*.jl.cov
*.jl.mem
/Manifest.toml

test.jl

Manifest.toml
8 changes: 3 additions & 5 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
name = "BinomialGPU"
uuid = "c5bbfde1-2136-42cd-9b65-d5719df69ebf"
authors = ["Simone Carlo Surace"]
version = "0.2.3"
version = "0.2.4"

[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
GPUArrays = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"

[compat]
BenchmarkTools = "0.6"
CUDA = "2"
GPUArrays = "6"
julia = "1.5"
CUDA = "3"
julia = "1.6"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Expand Down
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,23 @@ This package provides a function `rand_binomial!` to produce `CuArrays` with bin

## Installation

Use the built-in package manager:
Use the built-in package manager, accessed in Julia via `]`:

```julia
import Pkg; Pkg.add("BinomialGPU")
(@v1.6) pkg> add BinomialGPU#experimental_rand
```

in order this branch of this package, you need to also install this branch of CUDA.jl:

```julia
(@v1.6) pkg> add CUDA#tb/speedup_rand
```

If you do not want to do this, use the latest release, which has the same functionality, but is a bit slower:

```julia
(@v1.6) pkg> add BinomialGPU
```

## Usage

Expand All @@ -38,13 +49,12 @@ counts = rand(1:128, 2, 4)
probs = CUDA.rand(2)
rand_binomial!(A, count = counts, prob = probs)
```
or arbitrary combinations of scalars and arrays of parameters.


## Issues

* There is currently a bug (see [issue #3](https://github.com/JuliaGPU/BinomialGPU.jl/issues/3)): if the dimension of the sampled array is equal to or larger than the dimension of either `count` or `prob` plus 3, an error is thrown. Other sizes work fine.
* The sampler is fast: it is about one order of magnitude faster than other samplers. But it is still an open question whether it can be made faster, whether there are other samplers with competitive speed, and it shows some non-intuitive behavior:
* The functionality to draw random numbers within CUDA.jl kernels is still under development. A new function `rand()` has recently become available, but it hasn't been tried within this package. See [issue #7](https://github.com/JuliaGPU/BinomialGPU.jl/issues/7).
* The speed is faster in Julia 1.5.4 than in the current Julia 1.6 release candidate. See [issue #8](https://github.com/JuliaGPU/BinomialGPU.jl/issues/8).
* The speed is slower when using optimal thread allocation than when defaulting to 256 threads. See [issue #2](https://github.com/JuliaGPU/BinomialGPU.jl/issues/2)
* Are there any other samplers that are comparably fast or faster? I compared the following: sample an array of size `(1024, 1024)` with `count = 128` and `prob` of size `(1024, 1024)` with uniformly drawn entries. Timings on an RTX2070 card: BinomialGPU.jl 1.4ms, PyTorch 11ms, CuPy 18ms, tensorflow 400ms. Please let me know if you know samplers that are not yet listed.
The sampler is fast: it is about one order of magnitude faster than other samplers. But it is still an open question whether it can be made faster, whether there are other samplers with competitive speed, and it shows some non-intuitive behavior:
* The speed is faster in Julia 1.5.4 than in the current Julia 1.6 release candidate. See [issue #8](https://github.com/JuliaGPU/BinomialGPU.jl/issues/8).
* The speed is slower when using optimal thread allocation than when defaulting to 256 threads. See [issue #2](https://github.com/JuliaGPU/BinomialGPU.jl/issues/2)
* Are there any other samplers that are comparably fast or faster? I compared the following: sample an array of size `(1024, 1024)` with `count = 128` and `prob` of size `(1024, 1024)` with uniformly drawn entries. Timings on an RTX2070 card: BinomialGPU.jl 0.6ms, PyTorch 11ms, CuPy 18ms, tensorflow 400ms. Please let me know if you know samplers that are not yet listed.
1 change: 0 additions & 1 deletion src/BinomialGPU.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
module BinomialGPU

using CUDA
using GPUArrays

# user-level API
include("rand_binomial.jl")
Expand Down
Loading