-
-
Notifications
You must be signed in to change notification settings - Fork 61
Address comments of Discourse #600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,135 @@ | ||||||
# GPU-Accelerated Linear Solving in Julia | ||||||
|
||||||
LinearSolve.jl provides two ways to GPU accelerate linear solves: | ||||||
|
||||||
* Offloading: offloading takes a CPU-based problem and automatically transforms it into a | ||||||
GPU-based problem in the background, and returns the solution on CPU. Thus using | ||||||
offloading requires no change on the part of the user other than to choose an offloading | ||||||
solver. | ||||||
* Array type interface: the array type interface requires that the user defines the | ||||||
`LinearProblem` using an `AbstractGPUArray` type and chooses an appropriate solver | ||||||
(or uses the default solver). The solution will then be returned as a GPU array type. | ||||||
|
||||||
The offloading approach has the advantage of being simpler and requiring no change to | ||||||
existing CPU code, while having the disadvantage of having more overhead. In the following | ||||||
sections we will demonstrate how to use each of the approaches. | ||||||
|
||||||
!!! warn | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
GPUs are not always faster! Your matrices need to be sufficiently large in order for | ||||||
GPU accelerations to actually be faster. For offloading it's around 1,000 x 1,000 matrices | ||||||
and for Array type interface it's around 100 x 100. For sparse matrices, it is highly | ||||||
dependent on the sparsity pattern and the amount of fill-in. | ||||||
|
||||||
## GPU-Offloading | ||||||
|
||||||
GPU offloading is simple as it's done simply by changing the solver algorithm. Take the | ||||||
example from the start of the documentation: | ||||||
|
||||||
```julia | ||||||
using LinearSolve | ||||||
|
||||||
A = rand(4, 4) | ||||||
b = rand(4) | ||||||
prob = LinearProblem(A, b) | ||||||
sol = solve(prob) | ||||||
sol.u | ||||||
``` | ||||||
|
||||||
This computation can be moved to the GPU by the following: | ||||||
|
||||||
```julia | ||||||
using CUDA # Add the GPU library | ||||||
sol = solve(prob, CudaOffloadFactorization()) | ||||||
sol.u | ||||||
``` | ||||||
|
||||||
## GPUArray Interface | ||||||
|
||||||
For more manual control over the factorization setup, you can use the | ||||||
[GPUArray interface](https://juliagpu.github.io/GPUArrays.jl/dev/), the most common | ||||||
instantiation being [CuArray for CUDA-based arrays on NVIDIA GPUs](https://cuda.juliagpu.org/stable/usage/array/). | ||||||
To use this, we simply send the matrix `A` and the value `b` over to the GPU and solve: | ||||||
|
||||||
```julia | ||||||
using CUDA | ||||||
|
||||||
A = rand(4, 4) |> cu | ||||||
b = rand(4) |> cu | ||||||
prob = LinearProblem(A, b) | ||||||
sol = solve(prob) | ||||||
sol.u | ||||||
``` | ||||||
|
||||||
``` | ||||||
4-element CuArray{Float32, 1, CUDA.DeviceMemory}: | ||||||
-27.02665 | ||||||
16.338171 | ||||||
-77.650116 | ||||||
106.335686 | ||||||
``` | ||||||
|
||||||
Notice that the solution is a `CuArray`, and thus one must use `Array(sol.u)` if you with | ||||||
to return it to the CPU. This setup does no automated memory transfers and will thus only | ||||||
move things to CPU on command. | ||||||
|
||||||
!!! warn | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
Many GPU functionalities, such as `CUDA.cu`, have a built-in preference for `Float32`. | ||||||
Generally it is much faster to use 32-bit floating point operations on GPU than 64-bit | ||||||
operations, and thus this is generally the right choice if going to such platforms. | ||||||
However, this change in numerical precision needs to be accounted for in your mathematics | ||||||
as it could lead to instabilities. To disable this, use a constructor that is more | ||||||
specific about the bitsize, such as `CuArray{Float64}(A)`. Additionally, preferring more | ||||||
stable factorization methods, such as `QRFactorization()`, can improve the numerics in | ||||||
such cases. | ||||||
|
||||||
Similarly to other use cases, you can choose the solver, for example: | ||||||
|
||||||
```julia | ||||||
sol = solve(prob, QRFactorization()) | ||||||
``` | ||||||
|
||||||
## Sparse Matrices on GPUs | ||||||
|
||||||
Currently, sparse matrix computations on GPUs are only supported for CUDA. This is done using | ||||||
the `CUDA.CUSPARSE` sublibrary. | ||||||
|
||||||
```julia | ||||||
using LinearAlgebra, CUDA.CUSPARSE | ||||||
T = Float32 | ||||||
n = 100 | ||||||
A_cpu = sprand(T, n, n, 0.05) + I | ||||||
x_cpu = zeros(T, n) | ||||||
b_cpu = rand(T, n) | ||||||
|
||||||
A_gpu_csr = CuSparseMatrixCSR(A_cpu) | ||||||
b_gpu = CuVector(b_cpu) | ||||||
``` | ||||||
|
||||||
In order to solve such problems using a direct method, you must add | ||||||
[CUDSS.jl](https://github.com/exanauts/CUDSS.jl). This looks like: | ||||||
|
||||||
```julia | ||||||
using CUDSS | ||||||
sol = solve(prob, LUFactorization()) | ||||||
``` | ||||||
|
||||||
!!! note | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
For now, CUDSS only supports CuSparseMatrixCSR type matrices. | ||||||
|
||||||
Note that `KrylovJL` methods also work with sparse GPU arrays: | ||||||
|
||||||
```julia | ||||||
sol = solve(prob, KrylovJL_GMRES()) | ||||||
``` | ||||||
|
||||||
Note that CUSPARSE also has some GPU-based preconditioners, such as a built-in `ilu`. However: | ||||||
|
||||||
```julia | ||||||
sol = solve(prob, KrylovJL_GMRES(precs = (A, p) -> (CUDA.CUSPARSE.ilu02!(A, 'O'), I))) | ||||||
``` | ||||||
|
||||||
However, right now CUSPARSE is missing the right `ldiv!` implementation for this to work | ||||||
in general. See https://github.com/SciML/LinearSolve.jl/issues/341 for details. |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,9 +1,7 @@ | ||||||||||||||||||||||
# Solving Linear Systems in Julia | ||||||||||||||||||||||
# Getting Started with Solving Linear Systems in Julia | ||||||||||||||||||||||
|
||||||||||||||||||||||
A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` `A`, or | ||||||||||||||||||||||
by providing a matrix-free operator for performing `A*x` operations via the | ||||||||||||||||||||||
function `A(u,p,t)` out-of-place and `A(du,u,p,t)` for in-place. For the sake | ||||||||||||||||||||||
of simplicity, this tutorial will only showcase concrete matrices. | ||||||||||||||||||||||
A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` or `AbstractSciMLOperator`. | ||||||||||||||||||||||
For the sake of simplicity, this tutorial will start by only showcasing concrete matrices. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The following defines a matrix and a `LinearProblem` which is subsequently solved | ||||||||||||||||||||||
by the default linear solver. | ||||||||||||||||||||||
|
@@ -34,4 +32,48 @@ sol.u | |||||||||||||||||||||
|
||||||||||||||||||||||
Thus, a package which uses LinearSolve.jl simply needs to allow the user to | ||||||||||||||||||||||
pass in an algorithm struct and all wrapped linear solvers are immediately | ||||||||||||||||||||||
available as tweaks to the general algorithm. | ||||||||||||||||||||||
available as tweaks to the general algorithm. For more information on the | ||||||||||||||||||||||
available solvers, see [the solvers page](@ref linearsystemsolvers) | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Sparse and Structured Matrices | ||||||||||||||||||||||
|
||||||||||||||||||||||
There is no difference in the interface for using LinearSolve.jl on sparse | ||||||||||||||||||||||
and structured matrices. For example, the following now uses Julia's | ||||||||||||||||||||||
built-in [SparseArrays.jl](https://docs.julialang.org/en/v1/stdlib/SparseArrays/) | ||||||||||||||||||||||
to define a sparse matrix (`SparseMatrixCSC`) and solve the system using LinearSolve.jl. | ||||||||||||||||||||||
Note that `sprand` is a shorthand for quickly creating a sparse random matrix | ||||||||||||||||||||||
(see SparseArrays.jl for more details on defining sparse matrices). | ||||||||||||||||||||||
|
||||||||||||||||||||||
```@example linsys1 | ||||||||||||||||||||||
using LinearSolve, SparseArrays | ||||||||||||||||||||||
|
||||||||||||||||||||||
A = sprand(4, 4, 0.75) | ||||||||||||||||||||||
b = rand(4) | ||||||||||||||||||||||
prob = LinearProblem(A, b) | ||||||||||||||||||||||
sol = solve(prob) | ||||||||||||||||||||||
sol.u | ||||||||||||||||||||||
|
||||||||||||||||||||||
sol = solve(prob, KrylovJL_GMRES()) # Choosing algorithms is done the same way | ||||||||||||||||||||||
sol.u | ||||||||||||||||||||||
``` | ||||||||||||||||||||||
|
||||||||||||||||||||||
Similerly structure matrix types, like banded matrices, can be provided using special matrix | ||||||||||||||||||||||
types. While any `AbstractMatrix` type should be compatible via the general Julia interfaces, | ||||||||||||||||||||||
LinearSolve.jl specifically tests with the following cases: | ||||||||||||||||||||||
|
||||||||||||||||||||||
* [BandedMatrices.jl](https://github.com/JuliaLinearAlgebra/BandedMatrices.jl) | ||||||||||||||||||||||
* [BlockDiagonals.jl](https://github.com/JuliaArrays/BlockDiagonals.jl) | ||||||||||||||||||||||
* [CUDA.jl](https://cuda.juliagpu.org/stable/) (CUDA GPU-based dense and sparse matrices) | ||||||||||||||||||||||
* [FastAlmostBandedMatrices.jl](https://github.com/SciML/FastAlmostBandedMatrices.jl) | ||||||||||||||||||||||
* [Metal.jl](https://metal.juliagpu.org/stable/) (Apple M-series GPU-based dense matrices) | ||||||||||||||||||||||
|
||||||||||||||||||||||
Comment on lines
+64
to
+68
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||||
## Using Matrix-Free Operators | ||||||||||||||||||||||
|
||||||||||||||||||||||
In many cases where a sparse matrix gets really large, even the sparse representation | ||||||||||||||||||||||
cannot be stored in memory. However, in many such cases, such as with PDE discretizations, | ||||||||||||||||||||||
you may be able to write down a function that directly computes `A*x`. These "matrix-free" | ||||||||||||||||||||||
operators allow the user to define the `Ax=b` problem to be solved giving only the definition | ||||||||||||||||||||||
of `A*x` and allowing specific solvers (Krylov methods) to act without ever constructing | ||||||||||||||||||||||
the full matrix. | ||||||||||||||||||||||
|
||||||||||||||||||||||
**This will be documented in more detail in the near future** |
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -117,7 +117,7 @@ function SciMLBase.solve!(cache::LinearCache, alg::LUFactorization; kwargs...) | |||||||
end | ||||||||
cache.cacheval = fact | ||||||||
|
||||||||
if !LinearAlgebra.issuccess(fact) | ||||||||
if hasmethod(LinearAlgebra.issuccess, Tuple{typeof(fact)}) && !LinearAlgebra.issuccess(fact) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||
return SciMLBase.build_linear_solution( | ||||||||
alg, cache.u, nothing, cache; retcode = ReturnCode.Failure) | ||||||||
end | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶