Skip to content

Update AD section of performance tips #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,33 @@ end

## Choose your AD backend

Turing currently provides support for two different automatic differentiation (AD) backends.
Generally, try to use `:forwarddiff` for models with few parameters and `:reversediff`, `:tracker` or `:zygote` for models with large parameter vectors or linear algebra operations. See [Automatic Differentiation](autodiff) for details.
Automatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC, and that means a good AD system is incredibly important. Turing currently
supports several AD backends, including [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) (the default), [Zygote](https://github.com/FluxML/Zygote.jl),
[ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Tracker](https://github.com/FluxML/Tracker.jl). Experimental support is also available for
[Tapir](https://github.com/withbayes/Tapir.jl).

## Special care for `:tracker` and `:zygote`
For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you have a need for more speed, you can try
different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g.
`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation](autodiff) for details. Generally, `adtype = AutoForwardDiff()` is likely to be fastest and most reliable for models with
few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra
operations. If in doubt, it's easy to try a few different backends to see how they compare.

In case of `:tracker` and `:zygote`, it is necessary to avoid loops for now.
This is mainly due to the reverse-mode AD backends `Tracker` and `Zygote` which are inefficient for such cases. `ReverseDiff` does better but vectorized operations will still perform better.
### Special care for Zygote and Tracker

Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)` is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops.
Note that Zygote and Tracker will not perform well if your model contains `for`-loops, due to the way reverse-mode AD is implemented in these packages. Zygote also cannot differentiate code
that contains mutating operations. If you can't implement your model without `for`-loops or mutation, `ReverseDiff` will be a better option that is much more performant. In general, though,
vectorized operations are still likely to perform best.

Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent
copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate
distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)`
is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops.

### Special care for ReverseDiff with a compiled tape

For large models, the fastest option is often ReverseDiff with a compiled tape, specified as `adtype=AutoReverseDiff(true)`. However, it is important to note that if your model contains any
branching code, such as `if`-`else` statements, **the gradients from a compiled tape may be inaccurate, leading to erroneous results**. If you use this option for the (considerable) speedup it
can provide, make sure to check your code. It's also a good idea to verify your gradients with another backend.

## Ensure that types in your model can be inferred

Expand Down