Skip to content

Update loglik and re-enable its tests #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 10, 2021

Conversation

brandonwillard
Copy link
Member

@brandonwillard brandonwillard commented Jun 5, 2021

This PR puts loglik back into working order.

  • Traverse toposorted graph and build the sum of log-likelihoods
  • Derive RandonVariable forms for observed non-RandonVariable Ops

@brandonwillard brandonwillard added the important This label is used to indicate priority over things not given this label label Jun 5, 2021
@brandonwillard brandonwillard self-assigned this Jun 5, 2021
@brandonwillard brandonwillard changed the title Update loglik implementation and re-enable its tests Update loglik and re-enable its tests Jun 5, 2021
@codecov
Copy link

codecov bot commented Jun 5, 2021

Codecov Report

Merging #3 (9979344) into main (7fae212) will increase coverage by 11.89%.
The diff coverage is 90.70%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main       #3       +/-   ##
===========================================
+ Coverage   64.85%   76.74%   +11.89%     
===========================================
  Files           3        5        +2     
  Lines         552      886      +334     
===========================================
+ Hits          358      680      +322     
- Misses        194      206       +12     
Impacted Files Coverage Δ
aeppl/opt.py 65.57% <65.57%> (ø)
aeppl/joint_logprob.py 88.63% <88.63%> (ø)
aeppl/printing.py 95.04% <95.04%> (ø)
aeppl/logprob.py 94.52% <100.00%> (ø)
aeppl/_version.py 44.40% <0.00%> (+3.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fae212...9979344. Read the comment docs.

@brandonwillard brandonwillard marked this pull request as draft June 5, 2021 04:33
@brandonwillard
Copy link
Member Author

Now that loglik walks the entire graph and computes log-likelihoods, we need to perform the transforms (i.e. the ones that allow us to compute log-likelihoods for operations on RandomVariables) before walking the graph.

The problem with the current approach is that it reintroduces log-likelihood terms into the graph: the untransformed ones and the transformed ones.

We could use the current approach, but we would need to walk the graph backwards (i.e. outputs to inputs) and keep track of which nodes were transformed and ignore their old counterpart terms.

@brandonwillard brandonwillard marked this pull request as ready for review June 6, 2021 06:58
aeppl/loglik.py Outdated

q = deque(io_toposort(graph_inputs((rv_var,)), (rv_var,)))

logpdf_var = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logpdf_var = 0 ? No need for the if statement below, then.I assume a 0 + is optimized away automatically...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be removed during canonicalization, but then it's just a matter of when and where you want to do that work. Since it doesn't really cost us anything to do that here, I would rather avoid introducing terms that would necessarily be removed.

aeppl/loglik.py Outdated
# Let's try deriving a log-likelihood from this
# non-`RandonVariable` node
q_logpdf_parts = []
for q_rv_var in node.outputs:
Copy link
Contributor

@ricardoV94 ricardoV94 Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with any multi-output nodes, is it safe to say the loglike can always be dealt independently for these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop should handle all cases just fine. The important thing is that we make sure to only add the log-likelihoods for the outputs that are supposed to be in the log-likelihood we're computing. The replacements check is supposed to do that, but, since other things could be in there, it might not be the best approach.

I'm about to push a docstring update to loglik that should explain exactly what this function is supposed to do (and why we should probably rename it).

Copy link
Contributor

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, knowing the PyMC3 version is working.

More generally, I have the following vague (and confused) thought:

Do we need to treat the RandomVariables differently as we walk down the graph? I wonder if we could just call _loglik on all nodes equally, add them up and, once the logp graph is finished, replace all the random variables by the respective value_vars (only once at the root level of the loglik call).

@ricardoV94
Copy link
Contributor

Now that loglik walks the entire graph and computes log-likelihoods, we need to perform the transforms (i.e. the ones that allow us to compute log-likelihoods for operations on RandomVariables) before walking the graph.

Which transforms are you referring to? Variable transforms, or graph transforms / rewrites? I am guessing the latter.

aeppl/loglik.py Outdated
if q_rv_value_var is None:
continue

# TODO: We need to prevent this from re-adding terms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the TODO concern here? Do you have a MWE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is referring to the cause of the failure in test_loglik_subtensor.

When the I = bernoulli(p) is lifted through the Y = normal(mu, sigma) in the graph of Y[I], producing Y_new = normal(mu[I], sigma[I]), the resulting log-likelihood ends up being something like log(p(I) * (p(Y_new) * p(I)) * p(Y)). The term in parenthesis is introduced by the call to _loglik/subtensor_loglik, and the erroneous p(I) and p(Y) surrounding it are from the io_toposort walk through the original graph in loglik.

Since the io_toposort walk in loglik knows nothing about what _loglik has already done, it just keeps walking the nodes in q and "re-adds" the terms/nodes that were transformed and computed in _loglik/subtensor_loglik.

@brandonwillard brandonwillard force-pushed the reenable-loglik branch 2 times, most recently from ae852d1 to ed53f52 Compare June 6, 2021 19:27
@brandonwillard
Copy link
Member Author

brandonwillard commented Jun 6, 2021

Seems fine, knowing the PyMC3 version is working.

Remember, here we're developing the general approach that PyMC will also use, so we can't really fall back on that.

Do we need to treat the RandomVariables differently as we walk down the graph? I wonder if we could just call _loglik on all nodes equally, add them up and, once the logp graph is finished, replace all the random variables by the respective value_vars (only once at the root level of the loglik call).

That's essentially what we're doing. The only distinction is that we must produce log-likelihoods for the nodes with value variables specified (i.e. the keys in value_vars)—everything else is conditioned upon.

Since the only two _loglik implementations are actually transforms, this is what I mean when I say we should apply the transforms first, then compute the sum as we currently are. Also, in that case, when a value variable is given to the output of a node that isn't a RandomVariable, we can be sure that our transforms weren't able to produce a measure for that sub-graph, and we can raise an error.

I've added an example in the docstring that should clarify what loglik actually does and what its parameters mean. With that in mind, we should rename loglik to prob, since that's really what we're computing.

@brandonwillard
Copy link
Member Author

@aesara-devs, should we rename loglik to something more general like logmeasure or logprob? We should also reconsider the name logpdf.

The logpdf implementations could simply go back to being the dispatch implementations for logmeasure/logprob/whatever, since calling the latter will yield the same result as the former anyway.

@ricardoV94
Copy link
Contributor

logprob sounds fine (and _logprob for the individual "dispatched" terms). Measure might be more close to the "definition" but probably more foreign to most people.

@brandonwillard brandonwillard force-pushed the reenable-loglik branch 2 times, most recently from 90bd831 to a802156 Compare June 7, 2021 06:20
@brandonwillard
Copy link
Member Author

brandonwillard commented Jun 7, 2021

All right, loglik (name pending) now operates as proposed/intended. It first applies a series of RandomVariable-related transforms and then computes the designated log-probability.

The old *Subtensor* lifting functionality works now, and we can easily add @ricardoV94's affine transforms (see pymc-devs/pymc#4653) —as well as numerous other convolutions—to the mix via additions to RVSinker! Simply put, we should be set up so that we can focus on graph rewrites that bring more and more sophisticated automatic log-probability functionality.

All that remains is to determine how we want to handle the old incsubtensor_loglik, but that's much less of a concern, since it's just a matter of changing the value variable associated with such a term. Also, it was really only designed to account for the missing data features in PyMC v4, and it can be done in other ways.

For instance, I'm pretty sure the same missing data situation can be handled outside of loglik entirely by setting the respective entry in loglik's value_vars argument to a value of at.set_subtensor(value_var[idx], data), where value_var is the value variable of the random variable with missing data, idx are the missing indices, and data is the observed data.

At most, we might need to create a transform that strips the *IncSubtensor* Op and returns the underlying RandomVariable.

@brandonwillard brandonwillard added the graph rewriting Involves the implementation of rewrites to Aesara graphs label Jun 7, 2021
aeppl/loglik.py Outdated
if sum:
logpdf_var = at.sum(logpdf_var)
else:
raise NotImplementedError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to dispatch to other Ops in here (such as for the affine transform)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's all handled via rewrites in RVSinker.

Copy link
Contributor

@ricardoV94 ricardoV94 Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I see what you have in mind. Is the goal to convert all graphs into basic random-variable nodes first and then simply rely on the respective logpdfs? Sounds a bit more limited than what I had in mind, but perhaps it's the best first approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bit more limited than what I had in mind, but perhaps it's the best first approach.

It's not limited in any way—at least not relative to the dispatch approach. It is significantly simpler, though.

For example, I just finished implementing everything, and, now, joint_logprob is very direct and shouldn't need to be changed in order to provide any of the features we've considered (e.g. support for convolutions, affine transforms, Scan/for-loops, etc.)

In the present form, joint_logprob takes a graph, rewrites it, and forms the sum of each RandomVariable's log-probability. That's it.

This neatly compartmentalizes the rewriting work by placing it squarely within the optimization/rewriting framework where it belongs.

@ricardoV94
Copy link
Contributor

It looks quite neat. Any plans for the specification of conditionals?

@brandonwillard
Copy link
Member Author

It looks quite neat. Any plans for the specification of conditionals?

That's already covered by the choice of entries in the rv_values argument. See the docstring for loglik.

Do you mean something else?

@ricardoV94
Copy link
Contributor

That's already covered by the choice of entries in the rv_values argument. See the docstring for loglik.

Do you mean something else?

I might be missing it. I mean the disambiguation when you have y = a + b vs y = a + b | b

@brandonwillard
Copy link
Member Author

That's already covered by the choice of entries in the rv_values argument. See the docstring for loglik.
Do you mean something else?

I might be missing it. I mean the disambiguation when you have y = a + b vs y = a + b | b

OK, I think I know what you're asking. Yeah, we can look into that soon.

@rlouf
Copy link
Member

rlouf commented Jun 7, 2021

logprob sounds fine (and _logprob for the individual "dispatched" terms).

Can we use logprob for the graph and terms using single dispatch? I'd find it more consistent. I otherwise know too little about aesara to give helpful comments on the code.

@brandonwillard
Copy link
Member Author

brandonwillard commented Jun 10, 2021

While trying to create a README.rst, I realized that we really needed some good pretty printing to demonstrate these new features, so I added it.

@brandonwillard brandonwillard force-pushed the reenable-loglik branch 2 times, most recently from f5f9471 to 56f1021 Compare June 10, 2021 04:49
@brandonwillard brandonwillard force-pushed the reenable-loglik branch 4 times, most recently from 7f4c87f to fad89a2 Compare June 10, 2021 05:24
@ricardoV94
Copy link
Contributor

It looks great but I think I still have the same conceptual issue.

How do you add arbitrary expressions to the logp graph when these cannot just be written in the form of of basic existing random variables.

For instance how would you add the Jacobian if a user wanted to define a log-beta?

y_rv = exp(beta(2, 2))
joint_logprob(y_rv, {y_rv : y_value}

The graph would be (I think)

beta_logpdf(log(y_value), 2, 2) - log(y_value)

It don't see in the current approach how you add the second tem to the graph. I see how you rewrite the exp(beta) to be beta(log(y_value))

Am I missing it or is something that was not supposed to be implemented or just for now.

Otherwise the joint prob looks really neat and the graph optimization at the beginning looks promising for some crazy experiments :D

@brandonwillard
Copy link
Member Author

brandonwillard commented Jun 10, 2021

It don't see in the current approach how you add the second term to the graph.

Which graph? The transforms are applied to the "sample-space" graph (i.e. the one with RandomVariables), not the "measure-space" graph (i.e. the joint log-probability), and adding Jacobian-like terms to the former wouldn't help.

There are a couple ways to accomplish what you're asking via the sample-space graph alone, and another that involves using Features like we currently are for updates to the value variable mappings.

For the former, we can write optimizations that generate specialized RandomVariables and do whatever we want via their _logprobs.

@brandonwillard brandonwillard merged commit 563707e into aesara-devs:main Jun 10, 2021
@brandonwillard brandonwillard deleted the reenable-loglik branch June 10, 2021 06:55
@ricardoV94
Copy link
Contributor

For the former, we can write optimizations that generate specialized RandomVariables and do whatever we want via their _logprobs.

I think that's the element I was missing. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
graph rewriting Involves the implementation of rewrites to Aesara graphs important This label is used to indicate priority over things not given this label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants