Skip to content

VariableOrderAccumulator #940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: breaking
Choose a base branch
from
Draft

VariableOrderAccumulator #940

wants to merge 4 commits into from

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented May 29, 2025

Removes the order field of Metadata in favour of having an OrderedDict{VarName,Int} in the same accumulator as num_produce (renaming NumProduceAccumulator to VariableOrderAccumulator in the process). Also adds some == methods we were previously missing.

This is currently passing tests except anything related to JET. I think JET freaks out because the OrderedDict within the new accumulator has an abstract key type. I think it's fine to have the abstract key type as long as the value type is concrete, at least once we remove VariableOrderAccumulator from the set of default accumulators and only use it when doing ParticleGibbs. I'm thus tempted to not fix the JET issues and move this whole accumulator from DPPL to Turing.jl's part that interfaces with AdvancedPS. Not sure how to handle merging this PR in that case though.

Comment on lines +169 to +171
function Base.:(==)(vi1::VarInfo, vi2::VarInfo)
return (vi1.metadata == vi2.metadata && vi1.accs == vi2.accs)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In making this PR I learned that the default implementation for structs is

function Base.:(==)(vi1::VarInfo, vi2::VarInfo)
    return (vi1.metadata === vi2.metadata && vi1.accs === vi2.accs)
end

i.e. all the fields are compared with === even when calling ==. That was causing trouble with some tests that did == checks of comparing SimpleVarInfos. So note that before this PR e.g. VarInfo() != VarInfo(), and now VarInfo() == VarInfo().

Copy link
Contributor

Benchmark Report for Commit 0b12781

Computer Information

Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                108.2 |                 1.0 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |               3138.4 |                23.0 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |               1892.5 |                22.5 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               3852.3 |                21.2 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               7388.1 |                21.1 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               4489.2 |                 9.6 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |               3577.6 |                10.3 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |              31079.0 |                10.0 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1131.9 |                 8.0 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |             353565.8 |                11.8 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8945.4 |                 9.3 |
|               Dynamic |        10 |    mooncake |             typed |   true |                299.3 |                13.8 |
|              Submodel |         1 |    mooncake |             typed |   true |                111.8 |                 6.0 |
|                   LDA |        12 | reversediff |             typed |   true |               1479.9 |                 1.8 |

@@ -1808,13 +1800,12 @@ function BangBang.push!!(vi::VarInfo, vn::VarName, r, dist::Distribution)
[1:length(val)],
val,
[dist],
[get_num_produce(vi)],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behaviour: Previously calling push!! automatically set the order for a variable. Now order is set only if the push!! takes place within tilde_assume!!. Options for this are

  1. say that it's the caller's responsibility to call set_order!! after push!!. This could be fine because only ParticleGibbs cares about order.
  2. add an extra hook for accumulators for push!!, that gets called on all accumulators on every push!! call, so that they can adjust their state accordingly.

If this is only relevant for VariableOrderAccumulator then I'd lean towards 1. If it comes up with other accumulators too then 2. might be warranted.

Similar considerations apply to at least push!, merge, and subset, which after this PR might result in out-of-sync VariableOrderAccumulators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only relevant for VariableOrderAccumulator then I'd lean towards 1.

I think that it's PG's responsibility to call setorder correctly, rather than DPPL, so I'd agree.

Similar considerations apply to at least push!, merge, and subset

Still think it should be handled in PG, not here. I assume that we could write functions like

function pg_push!!(...)
    vi = push!!(...)
    return setorder!!(...)
end

and make sure to always use that in the PG code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with that as long as it doesn't turn out that this is a common need for accumulators. One other instance comes to mind: Currently if you have a PointwiseLogDensityAccumulator in your varinfo and you subset or merge, the pointwise log densities don't get subsetted/merged, and you end up with an accumulator that tracks different variables from the varinfo. This is inconsequential because the use of PointwiseLogDensityAccumulator is so confined to calling the function that needs it.

I'm happy to make PG deal with this, but let's keep our eyes open in case this comes up with other accumulators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PointwiseLogDensityAccumulator in your varinfo and you subset or merge

Ah, I see -- this would be true in the past as well with PointwiseLogDensityContext tracking different things from the subsetted varinfo, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I don't think PLDAccumulator by itself is a good enough argument for making these subset and merge functions, but it just made me wonder if this is a more common pattern with accumulators than we would at first assume. Easy to leave them out now and add them later if needed though.

@mhauru mhauru requested a review from penelopeysm May 29, 2025 16:00
@mhauru
Copy link
Member Author

mhauru commented May 29, 2025

Benchmark times indicate a horrendous loss of type stability. Will investigate, probably tomorrow.

@penelopeysm
Copy link
Member

Not sure how to handle merging this PR in that case though.

I had similar problems with other PRs. How about this?

  1. Make sure we're happy with the code, then drop it from the default accumulators and release a new minor version of DPPL. This will break upstream PG
  2. Fix PG to work with it, release new version of Turing
  3. Find code that can be moved from DPPL and move it to Turing

@mhauru
Copy link
Member Author

mhauru commented May 30, 2025

Is there a particular reason to first drop it from default accumulators and then move it to Turing.jl, rather than doing both in one go?

Also, regardless of what we do, I would develop the corresponding Turing.jl release in parallel, to avoid having to make a lot of patch DPPL releases when we realise we are missing something. I've started that work in TuringLang/Turing.jl#2550, but not yet for VariableOrderAccumulator.

@penelopeysm
Copy link
Member

penelopeysm commented May 30, 2025

Because it's annoyingly difficult to make Turing CI run with an unreleased version of DPPL, short of committing a test/Manifest.toml. There's the new [sources] thing that lets you point to unreleased versions, but it's 1.11 only, so the 1.10 tests will still need a Manifest. But I suppose if you're willing to run tests locally, that's fine (and maybe now that the tests are faster it's less unpalatable -- I've always hated running tests locally because of various reasons, the time being one of them, fiddling with imports and stuff being another).

(I don't think patch releases are really problematic, but there is always the possibility of having to make multiple minor releases to fix bugs, so I see the point)

@mhauru
Copy link
Member Author

mhauru commented May 30, 2025

The performance problem turned out to not be type stability, but rather that every call to unflatten (which happens with every call to logdensity) resulted in a call to deepcopy(::OrderedDict) in VariableOrderAccumulator. And those, it seems, are really slow. I've replaced OrderedDict with Dict (didn't really need the ordering anyway) and started to use copy rather than deecopy, let's see what that does to the benchmarks. (Seems like it makes them crash...)

Two thoughts:

  • We should probably go over the codebase and replace a lot of uses of deepcopy with copy, because deepcopy is bad practice.
  • VariableOrderAccumulator would be another use of VarNameTuple or some such data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants