diff --git a/docs/pages.jl b/docs/pages.jl index c336a2dc0..baeff4e84 100644 --- a/docs/pages.jl +++ b/docs/pages.jl @@ -1,9 +1,11 @@ # Put in a separate page so it can be used by SciMLDocs.jl pages = ["index.md", - "Tutorials" => Any["tutorials/linear.md", + "tutorials/linear.md", + "Tutorials" => Any[ "tutorials/caching_interface.md", - "tutorials/accelerating_choices.md"], + "tutorials/accelerating_choices.md", + "tutorials/gpu.md"], "Basics" => Any["basics/LinearProblem.md", "basics/common_solver_opts.md", "basics/OperatorAssumptions.md", diff --git a/docs/src/tutorials/accelerating_choices.md b/docs/src/tutorials/accelerating_choices.md index 584282f88..387326343 100644 --- a/docs/src/tutorials/accelerating_choices.md +++ b/docs/src/tutorials/accelerating_choices.md @@ -78,7 +78,8 @@ should be thought about: 3. A Krylov subspace method with proper preconditioning will be better than direct solvers when the matrices get large enough. You could always precondition a sparse matrix with iLU as an easy choice, though the tolerance would need to be tuned in a problem-specific - way. + way. Please see the [preconditioenrs page](https://docs.sciml.ai/LinearSolve/stable/basics/Preconditioners/) + for more information on defining and using preconditioners. !!! note diff --git a/docs/src/tutorials/gpu.md b/docs/src/tutorials/gpu.md new file mode 100644 index 000000000..4717f2f16 --- /dev/null +++ b/docs/src/tutorials/gpu.md @@ -0,0 +1,135 @@ +# GPU-Accelerated Linear Solving in Julia + +LinearSolve.jl provides two ways to GPU accelerate linear solves: + +* Offloading: offloading takes a CPU-based problem and automatically transforms it into a + GPU-based problem in the background, and returns the solution on CPU. Thus using + offloading requires no change on the part of the user other than to choose an offloading + solver. +* Array type interface: the array type interface requires that the user defines the + `LinearProblem` using an `AbstractGPUArray` type and chooses an appropriate solver + (or uses the default solver). The solution will then be returned as a GPU array type. + +The offloading approach has the advantage of being simpler and requiring no change to +existing CPU code, while having the disadvantage of having more overhead. In the following +sections we will demonstrate how to use each of the approaches. + +!!! warn + + GPUs are not always faster! Your matrices need to be sufficiently large in order for + GPU accelerations to actually be faster. For offloading it's around 1,000 x 1,000 matrices + and for Array type interface it's around 100 x 100. For sparse matrices, it is highly + dependent on the sparsity pattern and the amount of fill-in. + +## GPU-Offloading + +GPU offloading is simple as it's done simply by changing the solver algorithm. Take the +example from the start of the documentation: + +```julia +using LinearSolve + +A = rand(4, 4) +b = rand(4) +prob = LinearProblem(A, b) +sol = solve(prob) +sol.u +``` + +This computation can be moved to the GPU by the following: + +```julia +using CUDA # Add the GPU library +sol = solve(prob, CudaOffloadFactorization()) +sol.u +``` + +## GPUArray Interface + +For more manual control over the factorization setup, you can use the +[GPUArray interface](https://juliagpu.github.io/GPUArrays.jl/dev/), the most common +instantiation being [CuArray for CUDA-based arrays on NVIDIA GPUs](https://cuda.juliagpu.org/stable/usage/array/). +To use this, we simply send the matrix `A` and the value `b` over to the GPU and solve: + +```julia +using CUDA + +A = rand(4, 4) |> cu +b = rand(4) |> cu +prob = LinearProblem(A, b) +sol = solve(prob) +sol.u +``` + +``` +4-element CuArray{Float32, 1, CUDA.DeviceMemory}: + -27.02665 + 16.338171 + -77.650116 + 106.335686 +``` + +Notice that the solution is a `CuArray`, and thus one must use `Array(sol.u)` if you with +to return it to the CPU. This setup does no automated memory transfers and will thus only +move things to CPU on command. + +!!! warn + + Many GPU functionalities, such as `CUDA.cu`, have a built-in preference for `Float32`. + Generally it is much faster to use 32-bit floating point operations on GPU than 64-bit + operations, and thus this is generally the right choice if going to such platforms. + However, this change in numerical precision needs to be accounted for in your mathematics + as it could lead to instabilities. To disable this, use a constructor that is more + specific about the bitsize, such as `CuArray{Float64}(A)`. Additionally, preferring more + stable factorization methods, such as `QRFactorization()`, can improve the numerics in + such cases. + +Similarly to other use cases, you can choose the solver, for example: + +```julia +sol = solve(prob, QRFactorization()) +``` + +## Sparse Matrices on GPUs + +Currently, sparse matrix computations on GPUs are only supported for CUDA. This is done using +the `CUDA.CUSPARSE` sublibrary. + +```julia +using LinearAlgebra, CUDA.CUSPARSE +T = Float32 +n = 100 +A_cpu = sprand(T, n, n, 0.05) + I +x_cpu = zeros(T, n) +b_cpu = rand(T, n) + +A_gpu_csr = CuSparseMatrixCSR(A_cpu) +b_gpu = CuVector(b_cpu) +``` + +In order to solve such problems using a direct method, you must add +[CUDSS.jl](https://github.com/exanauts/CUDSS.jl). This looks like: + +```julia +using CUDSS +sol = solve(prob, LUFactorization()) +``` + +!!! note + + For now, CUDSS only supports CuSparseMatrixCSR type matrices. + +Note that `KrylovJL` methods also work with sparse GPU arrays: + +```julia +sol = solve(prob, KrylovJL_GMRES()) +``` + +Note that CUSPARSE also has some GPU-based preconditioners, such as a built-in `ilu`. However: + +```julia +sol = solve(prob, KrylovJL_GMRES(precs = (A, p) -> (CUDA.CUSPARSE.ilu02!(A, 'O'), I))) +``` + +However, right now CUSPARSE is missing the right `ldiv!` implementation for this to work +in general. See https://github.com/SciML/LinearSolve.jl/issues/341 for details. diff --git a/docs/src/tutorials/linear.md b/docs/src/tutorials/linear.md index 3e531c956..089e0a172 100644 --- a/docs/src/tutorials/linear.md +++ b/docs/src/tutorials/linear.md @@ -1,9 +1,7 @@ -# Solving Linear Systems in Julia +# Getting Started with Solving Linear Systems in Julia -A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` `A`, or -by providing a matrix-free operator for performing `A*x` operations via the -function `A(u,p,t)` out-of-place and `A(du,u,p,t)` for in-place. For the sake -of simplicity, this tutorial will only showcase concrete matrices. +A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` or `AbstractSciMLOperator`. +For the sake of simplicity, this tutorial will start by only showcasing concrete matrices. The following defines a matrix and a `LinearProblem` which is subsequently solved by the default linear solver. @@ -34,4 +32,48 @@ sol.u Thus, a package which uses LinearSolve.jl simply needs to allow the user to pass in an algorithm struct and all wrapped linear solvers are immediately -available as tweaks to the general algorithm. +available as tweaks to the general algorithm. For more information on the +available solvers, see [the solvers page](@ref linearsystemsolvers) + +## Sparse and Structured Matrices + +There is no difference in the interface for using LinearSolve.jl on sparse +and structured matrices. For example, the following now uses Julia's +built-in [SparseArrays.jl](https://docs.julialang.org/en/v1/stdlib/SparseArrays/) +to define a sparse matrix (`SparseMatrixCSC`) and solve the system using LinearSolve.jl. +Note that `sprand` is a shorthand for quickly creating a sparse random matrix +(see SparseArrays.jl for more details on defining sparse matrices). + +```@example linsys1 +using LinearSolve, SparseArrays + +A = sprand(4, 4, 0.75) +b = rand(4) +prob = LinearProblem(A, b) +sol = solve(prob) +sol.u + +sol = solve(prob, KrylovJL_GMRES()) # Choosing algorithms is done the same way +sol.u +``` + +Similerly structure matrix types, like banded matrices, can be provided using special matrix +types. While any `AbstractMatrix` type should be compatible via the general Julia interfaces, +LinearSolve.jl specifically tests with the following cases: + +* [BandedMatrices.jl](https://github.com/JuliaLinearAlgebra/BandedMatrices.jl) +* [BlockDiagonals.jl](https://github.com/JuliaArrays/BlockDiagonals.jl) +* [CUDA.jl](https://cuda.juliagpu.org/stable/) (CUDA GPU-based dense and sparse matrices) +* [FastAlmostBandedMatrices.jl](https://github.com/SciML/FastAlmostBandedMatrices.jl) +* [Metal.jl](https://metal.juliagpu.org/stable/) (Apple M-series GPU-based dense matrices) + +## Using Matrix-Free Operators + +In many cases where a sparse matrix gets really large, even the sparse representation +cannot be stored in memory. However, in many such cases, such as with PDE discretizations, +you may be able to write down a function that directly computes `A*x`. These "matrix-free" +operators allow the user to define the `Ax=b` problem to be solved giving only the definition +of `A*x` and allowing specific solvers (Krylov methods) to act without ever constructing +the full matrix. + +**This will be documented in more detail in the near future** diff --git a/src/factorization.jl b/src/factorization.jl index cd8f35dd7..84ac5a41d 100644 --- a/src/factorization.jl +++ b/src/factorization.jl @@ -117,7 +117,7 @@ function SciMLBase.solve!(cache::LinearCache, alg::LUFactorization; kwargs...) end cache.cacheval = fact - if !LinearAlgebra.issuccess(fact) + if hasmethod(LinearAlgebra.issuccess, Tuple{typeof(fact)}) && !LinearAlgebra.issuccess(fact) return SciMLBase.build_linear_solution( alg, cache.u, nothing, cache; retcode = ReturnCode.Failure) end