Skip to content

Address comments of Discourse #600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions docs/pages.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Put in a separate page so it can be used by SciMLDocs.jl

pages = ["index.md",
"Tutorials" => Any["tutorials/linear.md",
"tutorials/linear.md",
"Tutorials" => Any[
"tutorials/caching_interface.md",
"tutorials/accelerating_choices.md"],
"tutorials/accelerating_choices.md",
"tutorials/gpu.md"],
"Basics" => Any["basics/LinearProblem.md",
"basics/common_solver_opts.md",
"basics/OperatorAssumptions.md",
Expand Down
3 changes: 2 additions & 1 deletion docs/src/tutorials/accelerating_choices.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ should be thought about:
3. A Krylov subspace method with proper preconditioning will be better than direct solvers
when the matrices get large enough. You could always precondition a sparse matrix with
iLU as an easy choice, though the tolerance would need to be tuned in a problem-specific
way.
way. Please see the [preconditioenrs page](https://docs.sciml.ai/LinearSolve/stable/basics/Preconditioners/)
for more information on defining and using preconditioners.

!!! note

Expand Down
135 changes: 135 additions & 0 deletions docs/src/tutorials/gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# GPU-Accelerated Linear Solving in Julia

LinearSolve.jl provides two ways to GPU accelerate linear solves:

* Offloading: offloading takes a CPU-based problem and automatically transforms it into a
GPU-based problem in the background, and returns the solution on CPU. Thus using
offloading requires no change on the part of the user other than to choose an offloading
solver.
* Array type interface: the array type interface requires that the user defines the
`LinearProblem` using an `AbstractGPUArray` type and chooses an appropriate solver
(or uses the default solver). The solution will then be returned as a GPU array type.
Comment on lines +5 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
* Offloading: offloading takes a CPU-based problem and automatically transforms it into a
GPU-based problem in the background, and returns the solution on CPU. Thus using
offloading requires no change on the part of the user other than to choose an offloading
solver.
* Array type interface: the array type interface requires that the user defines the
`LinearProblem` using an `AbstractGPUArray` type and chooses an appropriate solver
(or uses the default solver). The solution will then be returned as a GPU array type.
- Offloading: offloading takes a CPU-based problem and automatically transforms it into a
GPU-based problem in the background, and returns the solution on CPU. Thus using
offloading requires no change on the part of the user other than to choose an offloading
solver.
- Array type interface: the array type interface requires that the user defines the
`LinearProblem` using an `AbstractGPUArray` type and chooses an appropriate solver
(or uses the default solver). The solution will then be returned as a GPU array type.


The offloading approach has the advantage of being simpler and requiring no change to
existing CPU code, while having the disadvantage of having more overhead. In the following
sections we will demonstrate how to use each of the approaches.

!!! warn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

GPUs are not always faster! Your matrices need to be sufficiently large in order for
GPU accelerations to actually be faster. For offloading it's around 1,000 x 1,000 matrices
and for Array type interface it's around 100 x 100. For sparse matrices, it is highly
dependent on the sparsity pattern and the amount of fill-in.

## GPU-Offloading

GPU offloading is simple as it's done simply by changing the solver algorithm. Take the
example from the start of the documentation:

```julia
using LinearSolve

A = rand(4, 4)
b = rand(4)
prob = LinearProblem(A, b)
sol = solve(prob)
sol.u
```

This computation can be moved to the GPU by the following:

```julia
using CUDA # Add the GPU library
sol = solve(prob, CudaOffloadFactorization())
sol.u
```

## GPUArray Interface

For more manual control over the factorization setup, you can use the
[GPUArray interface](https://juliagpu.github.io/GPUArrays.jl/dev/), the most common
instantiation being [CuArray for CUDA-based arrays on NVIDIA GPUs](https://cuda.juliagpu.org/stable/usage/array/).
To use this, we simply send the matrix `A` and the value `b` over to the GPU and solve:

```julia
using CUDA

A = rand(4, 4) |> cu
b = rand(4) |> cu
prob = LinearProblem(A, b)
sol = solve(prob)
sol.u
```

```
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
-27.02665
16.338171
-77.650116
106.335686
```

Notice that the solution is a `CuArray`, and thus one must use `Array(sol.u)` if you with
to return it to the CPU. This setup does no automated memory transfers and will thus only
move things to CPU on command.

!!! warn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

Many GPU functionalities, such as `CUDA.cu`, have a built-in preference for `Float32`.
Generally it is much faster to use 32-bit floating point operations on GPU than 64-bit
operations, and thus this is generally the right choice if going to such platforms.
However, this change in numerical precision needs to be accounted for in your mathematics
as it could lead to instabilities. To disable this, use a constructor that is more
specific about the bitsize, such as `CuArray{Float64}(A)`. Additionally, preferring more
stable factorization methods, such as `QRFactorization()`, can improve the numerics in
such cases.

Similarly to other use cases, you can choose the solver, for example:

```julia
sol = solve(prob, QRFactorization())
```

## Sparse Matrices on GPUs

Currently, sparse matrix computations on GPUs are only supported for CUDA. This is done using
the `CUDA.CUSPARSE` sublibrary.

```julia
using LinearAlgebra, CUDA.CUSPARSE
T = Float32
n = 100
A_cpu = sprand(T, n, n, 0.05) + I
x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu_csr = CuSparseMatrixCSR(A_cpu)
b_gpu = CuVector(b_cpu)
```

In order to solve such problems using a direct method, you must add
[CUDSS.jl](https://github.com/exanauts/CUDSS.jl). This looks like:

```julia
using CUDSS
sol = solve(prob, LUFactorization())
```

!!! note

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

For now, CUDSS only supports CuSparseMatrixCSR type matrices.

Note that `KrylovJL` methods also work with sparse GPU arrays:

```julia
sol = solve(prob, KrylovJL_GMRES())
```

Note that CUSPARSE also has some GPU-based preconditioners, such as a built-in `ilu`. However:

```julia
sol = solve(prob, KrylovJL_GMRES(precs = (A, p) -> (CUDA.CUSPARSE.ilu02!(A, 'O'), I)))
```

However, right now CUSPARSE is missing the right `ldiv!` implementation for this to work
in general. See https://github.com/SciML/LinearSolve.jl/issues/341 for details.
54 changes: 48 additions & 6 deletions docs/src/tutorials/linear.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
# Solving Linear Systems in Julia
# Getting Started with Solving Linear Systems in Julia

A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` `A`, or
by providing a matrix-free operator for performing `A*x` operations via the
function `A(u,p,t)` out-of-place and `A(du,u,p,t)` for in-place. For the sake
of simplicity, this tutorial will only showcase concrete matrices.
A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` or `AbstractSciMLOperator`.
For the sake of simplicity, this tutorial will start by only showcasing concrete matrices.

The following defines a matrix and a `LinearProblem` which is subsequently solved
by the default linear solver.
Expand Down Expand Up @@ -34,4 +32,48 @@ sol.u

Thus, a package which uses LinearSolve.jl simply needs to allow the user to
pass in an algorithm struct and all wrapped linear solvers are immediately
available as tweaks to the general algorithm.
available as tweaks to the general algorithm. For more information on the
available solvers, see [the solvers page](@ref linearsystemsolvers)

## Sparse and Structured Matrices

There is no difference in the interface for using LinearSolve.jl on sparse
and structured matrices. For example, the following now uses Julia's
built-in [SparseArrays.jl](https://docs.julialang.org/en/v1/stdlib/SparseArrays/)
to define a sparse matrix (`SparseMatrixCSC`) and solve the system using LinearSolve.jl.
Note that `sprand` is a shorthand for quickly creating a sparse random matrix
(see SparseArrays.jl for more details on defining sparse matrices).

```@example linsys1
using LinearSolve, SparseArrays

A = sprand(4, 4, 0.75)
b = rand(4)
prob = LinearProblem(A, b)
sol = solve(prob)
sol.u

sol = solve(prob, KrylovJL_GMRES()) # Choosing algorithms is done the same way
sol.u
```

Similerly structure matrix types, like banded matrices, can be provided using special matrix
types. While any `AbstractMatrix` type should be compatible via the general Julia interfaces,
LinearSolve.jl specifically tests with the following cases:

* [BandedMatrices.jl](https://github.com/JuliaLinearAlgebra/BandedMatrices.jl)
* [BlockDiagonals.jl](https://github.com/JuliaArrays/BlockDiagonals.jl)
* [CUDA.jl](https://cuda.juliagpu.org/stable/) (CUDA GPU-based dense and sparse matrices)
* [FastAlmostBandedMatrices.jl](https://github.com/SciML/FastAlmostBandedMatrices.jl)
* [Metal.jl](https://metal.juliagpu.org/stable/) (Apple M-series GPU-based dense matrices)

Comment on lines +64 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
* [BandedMatrices.jl](https://github.com/JuliaLinearAlgebra/BandedMatrices.jl)
* [BlockDiagonals.jl](https://github.com/JuliaArrays/BlockDiagonals.jl)
* [CUDA.jl](https://cuda.juliagpu.org/stable/) (CUDA GPU-based dense and sparse matrices)
* [FastAlmostBandedMatrices.jl](https://github.com/SciML/FastAlmostBandedMatrices.jl)
* [Metal.jl](https://metal.juliagpu.org/stable/) (Apple M-series GPU-based dense matrices)
- [BandedMatrices.jl](https://github.com/JuliaLinearAlgebra/BandedMatrices.jl)
- [BlockDiagonals.jl](https://github.com/JuliaArrays/BlockDiagonals.jl)
- [CUDA.jl](https://cuda.juliagpu.org/stable/) (CUDA GPU-based dense and sparse matrices)
- [FastAlmostBandedMatrices.jl](https://github.com/SciML/FastAlmostBandedMatrices.jl)
- [Metal.jl](https://metal.juliagpu.org/stable/) (Apple M-series GPU-based dense matrices)

## Using Matrix-Free Operators

In many cases where a sparse matrix gets really large, even the sparse representation
cannot be stored in memory. However, in many such cases, such as with PDE discretizations,
you may be able to write down a function that directly computes `A*x`. These "matrix-free"
operators allow the user to define the `Ax=b` problem to be solved giving only the definition
of `A*x` and allowing specific solvers (Krylov methods) to act without ever constructing
the full matrix.

**This will be documented in more detail in the near future**
2 changes: 1 addition & 1 deletion src/factorization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ function SciMLBase.solve!(cache::LinearCache, alg::LUFactorization; kwargs...)
end
cache.cacheval = fact

if !LinearAlgebra.issuccess(fact)
if hasmethod(LinearAlgebra.issuccess, Tuple{typeof(fact)}) && !LinearAlgebra.issuccess(fact)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
if hasmethod(LinearAlgebra.issuccess, Tuple{typeof(fact)}) && !LinearAlgebra.issuccess(fact)
if hasmethod(LinearAlgebra.issuccess, Tuple{typeof(fact)}) &&
!LinearAlgebra.issuccess(fact)

return SciMLBase.build_linear_solution(
alg, cache.u, nothing, cache; retcode = ReturnCode.Failure)
end
Expand Down
Loading