Description
We have a set of small benchmarks to quickly test our code in RxInfer. The aim of the package is to run efficient Bayesian inference, potentially on low-power low-memory devices like RaspberryPI. We just noticed, that on Julia 1.10 we have quite a noticeable GC regression. Consider this notebook. Not an MWE but still, this notebook computes Bayesian posteriors in a simple linear Gaussian state-space probabilistic model. There are two settings:
- Filtering, for each time step
$t$ use observations up to the time step$t$ . - Smoothing, for each time step
$t$ use observations up to the time step$T > t$
Here are the results on the current Julia release
julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
julia> @benchmark run_filtering($datastream, $n, $v)
BenchmarkTools.Trial: 1504 samples with 1 evaluation.
Range (min … max): 2.633 ms … 13.932 ms ┊ GC (min … max): 0.00% … 69.28%
Time (median): 3.073 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.319 ms ± 1.058 ms ┊ GC (mean ± σ): 7.08% ± 13.05%
▅▇▇██▇▅▃▂ ▁
██████████▇▇▅▇█▇▇▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▇██▇▆█▇▆▄▄▅ █
2.63 ms Histogram: log(frequency) by time 7.92 ms <
Memory estimate: 2.35 MiB, allocs estimate: 63823.
julia> @benchmark run_smoothing($data, $n, $v)
BenchmarkTools.Trial: 288 samples with 1 evaluation.
Range (min … max): 13.868 ms … 29.987 ms ┊ GC (min … max): 0.00% … 35.63%
Time (median): 15.545 ms ┊ GC (median): 0.00%
Time (mean ± σ): 17.411 ms ± 3.975 ms ┊ GC (mean ± σ): 10.81% ± 14.33%
▄▃█▁▄▅▁
▇███████▇▆▅▅▃▃▄▂▄▃▂▁▃▃▁▁▁▁▁▁▁▁▁▂▃▅▃▃▅▅▄▃▂▃▃▃▃▃▂▂▁▄▂▁▃▁▁▂▂▂▂ ▃
13.9 ms Histogram: frequency by time 28.4 ms <
Memory estimate: 10.05 MiB, allocs estimate: 220417.
Here are the results on the 1.10-beta1
julia> versioninfo()
Julia Version 1.10.0-beta1
Commit 6616549950e (2023-07-25 17:43 UTC)
julia> @benchmark run_filtering($datastream, $n, $v)
BenchmarkTools.Trial: 1308 samples with 1 evaluation.
Range (min … max): 3.260 ms … 78.207 ms ┊ GC (min … max): 0.00% … 94.71%
Time (median): 3.479 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.818 ms ± 3.293 ms ┊ GC (mean ± σ): 6.64% ± 7.41%
▄▆██▅▁
▂▃▄▇██████▇▅▅▃▃▃▃▃▂▂▃▃▃▃▃▃▂▃▃▂▂▂▁▂▂▂▂▁▂▁▂▂▁▂▂▁▁▁▂▂▂▂▂▂▂▂▂▂ ▃
3.26 ms Histogram: frequency by time 4.94 ms <
Memory estimate: 2.51 MiB, allocs estimate: 69824.
julia> @benchmark run_smoothing($data, $n, $v)
BenchmarkTools.Trial: 291 samples with 1 evaluation.
Range (min … max): 15.160 ms … 88.841 ms ┊ GC (min … max): 0.00% … 79.71%
Time (median): 15.757 ms ┊ GC (median): 0.00%
Time (mean ± σ): 17.336 ms ± 7.862 ms ┊ GC (mean ± σ): 7.05% ± 11.57%
█▅▁
█████▇▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▄ ▅
15.2 ms Histogram: log(frequency) by time 57.9 ms <
Memory estimate: 10.12 MiB, allocs estimate: 222915.
As you can see in the case of run_filtering
, the maximum time jumped from 13ms
to 78ms
. The GC max also indicates a jump from 69%
to 94%
. In the case of run_smoothing
the situation is similar, the maximum time jumped from 29ms
to 88ms
. The GC max jumped from 35%
to 79%
.
The inference precedure allocates a lot of intermediate "messages" in a form of distributions from Distributions.jl
package, but does not use any sampling. Instead, it computes analytical solutions for posteriors. This analytical solutions also rely on dynamic multiple dispatch in many places. Eliminating dynamic multiple dispatch is not really an option, it just how it works and it was quite efficient anyway until now.
The major differences between two functions is that run_filtering
allocates a lot of information (messages) and do not use it afterwards that is probably can be free-ed right away, and the run_smoothing
retains/stores this information till the end of the procedure. You can also see that the resulting minimum execution time is also worse in both cases.
I think this is quite a severe regression, especially for the filtering
case, which is supposed to run the real-time Bayesian inference with as little GC pauses as possible. We can of course refine our code base, but in the mean-time can it be improved in general? What can cause this? How should we proceed and debug this? How can we help figuring out further?
julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 12 virtual cores
julia>