diff --git a/tutorials/docs-13-using-turing-performance-tips/performancetips.jmd b/tutorials/docs-13-using-turing-performance-tips/performancetips.jmd index ce325e000..3cea85cd8 100644 --- a/tutorials/docs-13-using-turing-performance-tips/performancetips.jmd +++ b/tutorials/docs-13-using-turing-performance-tips/performancetips.jmd @@ -39,15 +39,33 @@ end ## Choose your AD backend -Turing currently provides support for two different automatic differentiation (AD) backends. -Generally, try to use `:forwarddiff` for models with few parameters and `:reversediff`, `:tracker` or `:zygote` for models with large parameter vectors or linear algebra operations. See [Automatic Differentiation](autodiff) for details. +Automatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC, and that means a good AD system is incredibly important. Turing currently +supports several AD backends, including [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) (the default), [Zygote](https://github.com/FluxML/Zygote.jl), +[ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Tracker](https://github.com/FluxML/Tracker.jl). Experimental support is also available for +[Tapir](https://github.com/withbayes/Tapir.jl). -## Special care for `:tracker` and `:zygote` +For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you have a need for more speed, you can try +different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g. +`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation](autodiff) for details. Generally, `adtype = AutoForwardDiff()` is likely to be fastest and most reliable for models with +few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra +operations. If in doubt, it's easy to try a few different backends to see how they compare. -In case of `:tracker` and `:zygote`, it is necessary to avoid loops for now. -This is mainly due to the reverse-mode AD backends `Tracker` and `Zygote` which are inefficient for such cases. `ReverseDiff` does better but vectorized operations will still perform better. +### Special care for Zygote and Tracker -Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)` is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops. +Note that Zygote and Tracker will not perform well if your model contains `for`-loops, due to the way reverse-mode AD is implemented in these packages. Zygote also cannot differentiate code +that contains mutating operations. If you can't implement your model without `for`-loops or mutation, `ReverseDiff` will be a better option that is much more performant. In general, though, +vectorized operations are still likely to perform best. + +Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent +copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate +distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)` +is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops. + +### Special care for ReverseDiff with a compiled tape + +For large models, the fastest option is often ReverseDiff with a compiled tape, specified as `adtype=AutoReverseDiff(true)`. However, it is important to note that if your model contains any +branching code, such as `if`-`else` statements, **the gradients from a compiled tape may be inaccurate, leading to erroneous results**. If you use this option for the (considerable) speedup it +can provide, make sure to check your code. It's also a good idea to verify your gradients with another backend. ## Ensure that types in your model can be inferred