Skip to content

Optimization #1953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

Optimization #1953

wants to merge 12 commits into from

Conversation

ferrine
Copy link
Member

@ferrine ferrine commented Mar 25, 2017

Bayesian optimization over posterior latent space is an interesting sort of problem that becomes real with this PR. Notebook with toy example is provided

@fonnesbeck you can find Histogram application for SVGD there

@ferrine
Copy link
Member Author

ferrine commented Mar 26, 2017

@twiecki @fonnesbeck I think I need a review. Do we need bayesian optimization in PyMC3?

@springcoil
Copy link
Contributor

Just a comment - and adding my two cents here. I've got no moral objection to having Bayesian Optimization in PyMC3, if you're willing to maintain the code. I'll give it a quick review now.

@fonnesbeck
Copy link
Member

I think this could be useful for fitting GP models that include discrete parameters, so I'm in favor.


Parameters
----------
kwargs : kwargs for theano.function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this could do with a bit more fleshing out in the docstring I think. Maybe with an example.


Parameters
----------
kwargs : kwargs for theano.function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this could do with a bit more fleshing out in the docstring I think. Maybe with an example.

@springcoil
Copy link
Contributor

Does anyone know any good papers to read on this? Would one of the application areas be something like Bandit problems?

@ferrine
Copy link
Member Author

ferrine commented Mar 26, 2017

@springcoil You can see this message from PR thread, I've described some applications

@twiecki
Copy link
Member

twiecki commented Mar 27, 2017

So this replaces sample_ppc?

@ferrine
Copy link
Member Author

ferrine commented Mar 27, 2017

@twiecki how does it replace?

def fit(self, n=5000, callbacks=()):
"""
Perform optimization steps
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs new-line I think before Parameters.

@twiecki
Copy link
Member

twiecki commented Mar 29, 2017

@ferrine the notebook needs way more descriptions, it's not clear to me at all. what are you trying to optimize, where could this be useful etc. For example: opt = pm.Optimizer(approx, y.sum(), [x], optimizer=sgd) Why do you sum y?

@fonnesbeck
Copy link
Member

fonnesbeck commented Mar 29, 2017

This could use some actual documentation (not just docstrings), that is, adding to the files in pymc3/docs so that there is a guide to usage. At least open a PR for the docs, if not here.

See #1968

@springcoil
Copy link
Contributor

Yeah I'd like to encourage actual documentation for this as well.

@ferrine
Copy link
Member Author

ferrine commented Apr 7, 2017

I've updated notebook. Hope it reveals the purpose of this tool better

@twiecki
Copy link
Member

twiecki commented Apr 7, 2017

@ferrine Much better, this is really neat.

  • I would describe what histogram.apply_replacements(y, deterministic=True).eval() does.
  • So Histogram is like a trace with special properties? Does it also work for ADVI? What about NUTS?
  • from itertools import zip_longest is not used.
  • y_ = abc[0]*x_**2 + abc[1]*x_ + abc[2] can't you use f() you defined above instead?
  • Quite a few typos: Condider, bayesian -> Bayesian

@ferrine
Copy link
Member Author

ferrine commented Apr 7, 2017

Okay, I'll take it all in account and change notebook soon

@ferrine
Copy link
Member Author

ferrine commented Apr 7, 2017

Done

@twiecki
Copy link
Member

twiecki commented Apr 8, 2017

@ferrine The docs are really enlightening. I wonder if Histogram is the right name. What about Posterior?

class Optimizer(object):
"""
Optimization with posterior replacements
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also new-line.

@ferrine
Copy link
Member Author

ferrine commented Apr 8, 2017

@twiecki I think Posterior does not reveal the nature of that class that is just storing particles. It can confuse user and if trace is bad it's not Posterior even.

@twiecki
Copy link
Member

twiecki commented Apr 8, 2017

Maybe, but Histogram is certainly confusing already. What about that makes it a histogram? PosteriorApprox, PosteriorSamples, PosteriorEst?

Another question is if we could marry this with Trace somehow?

@ferrine
Copy link
Member Author

ferrine commented Apr 8, 2017

@twiecki what about Particles?

@ferrine
Copy link
Member Author

ferrine commented Apr 8, 2017

It is married with trace already, or what do you mean?

@twiecki
Copy link
Member

twiecki commented Apr 8, 2017

Particles isn't bad, but I don't like the co-notation with particle samplers.

I mean to merge the functionality of Histogram into Trace. I don't necessarily think it's a good idea, but there are definitely parallels. As I understand it, Histogram = Trace + computational capabilities.

@ferrine
Copy link
Member Author

ferrine commented Apr 8, 2017

We support different backends in Trace I'm not sure all of them are possible in this paradigm. That's why I only use common Trace interface to initialize Histogram and immediately forget about it

@twiecki
Copy link
Member

twiecki commented Apr 8, 2017

I still think we should call it something with Posterior, because that's what the computations happen on.

@ferrine
Copy link
Member Author

ferrine commented Apr 8, 2017

Maybe Empirical? Empirical distribution is the one that consists of samples only. I do not like Posterior there as I inherit from Approximation and Posterior(Approximation) seems to be wrong name from that point of view. Empirical(Approximation) is be more intuitive

@twiecki
Copy link
Member

twiecki commented Apr 8, 2017

I see, what does Approximation do again? That's the base-class for VI?

@ferrine
Copy link
Member Author

ferrine commented Apr 8, 2017

That's the base class for VI results. It can perform sampling

@twiecki
Copy link
Member

twiecki commented Apr 8, 2017

I think it's time we think a bit deeper about the structure naming of variational. Could you help by providing an overview of the classes and their purpose and inheritance structure?

@twiecki
Copy link
Member

twiecki commented Apr 14, 2017

@ferrine I looked at the code again and Histogram is not the only class inheriting from Approximate, there's also MeanField and FullRank in which case inheritance makes sense. Which functionality does Histogram use from Approximate?

@jsalvatier
Copy link
Member

jsalvatier commented Apr 14, 2017

Very interesting!

If this is for Bayesian optimization, why is there so much variational specific stuff? You do Bayesian optimization with other methods. Oh, I see this is why Thomas is talking about merging the MCMC and variational representations.

What exactly is Histogram's role? If I'm reading this correctly, is the symbolic representation of the posterior (in this case stored as samples). You need that so that you can evaluate the model at specific points to compute the expected loss.

For bayesian optimization are you always going to go through sampling? My guess is yes.

If so, then you just need a common representation for a posterior distribution that can be used to generate samples. It should be a super simple representation. And then you need various inference methods that can return that representation.

@jsalvatier
Copy link
Member

jsalvatier commented Apr 14, 2017

I found the notebook a bit confusing, though I think I figured it out.

Definitely call histogram something else. Its hard to think about this way. Maybe posterior_samples or empirical_posterior?.

I would put the section on optimization before the section showing that you can do regular inference (especially since that part seems to be standard variational inference?).

I would also try to make the Optimization section a bit clearer. For example I would explicitly call y_ "loss" instead. It took me a while to think through what exactly y_ was and what we were doing to it.

You should also draw the minimum you found on a graph with the data and such.

@aseyboldt
Copy link
Member

I don't like the name Approximation either. I think the problem is that the name tells us how something is stored (namely approximate) and not what it is.
Why not call Approximation Posterior then? The child classes would then represent different (approximate) ways to specify a posterior. Histogram could then be called Sampled (or maybe even Trace, but that might be confusing if we have the other trace type around), which would match the other names MeanField etc in that the name specifies how the posterior is specified. And probably also rename the module to posterior.
There wouldn't be a reason to put those in the global pymc3 namespace, is there?

@jsalvatier
Copy link
Member

Thoughts on Optimizer:

I would prefer a stateless function rather than an object. State is generally bad. Makes things hard to reason about.

The history storing callback should definitely be inside of Optmizer.

Optimizer should probably also return the loss minimizing x, no?

I want to support random search as optimizer because hyperparameter optimization often proceeds that way.

@ferrine
Copy link
Member Author

ferrine commented Apr 15, 2017

@aseyboldt Posterior is a bad name for base class for several reasons.

  1. It is not actually true posterior

  2. child classes names are not consistent
    compare class MeanField(Posterior) with class MeanField(Approximation)

    The child classes would then represent different (approximate) ways to specify a posterior

    If they are approximations they should be associated with approximations, not true posterior that we never know

  3. rename the module to posterior.

    maybe it well be needed in future, when we decide to make unified inference result

@jsalvatier

  1. then you just need a common representation

    In my dreams, pymc3 API will be changed a lot

  2. Comments about Jupiter notebook, when I return to this PR I'll bake it better

  3. State is generally bad

    Not always, it is better for fine tuning, monitoiring and debugging. If you are not desired with result there is no need to recompile step functions etc

What about renaming Histogram, I considered Histogram->Empirical

@twiecki
Copy link
Member

twiecki commented Apr 15, 2017

Empirical seems to be the smallest common denominator. It's an empirical approximation from samples, which makes sense.

@fonnesbeck
Copy link
Member

I'm on board with Empirical.

We can talk about API changes on the next (first!) monthly PyMC video call.

@springcoil
Copy link
Contributor

When's the monthly call?

@aseyboldt
Copy link
Member

aseyboldt commented Apr 15, 2017 via email

@fonnesbeck
Copy link
Member

fonnesbeck commented Apr 15, 2017

@aseyboldt its empirical in the sense that it is constructed from samples, rather than from a parametric function. i.e. the observed outcomes from a sampling procedure.

@twiecki
Copy link
Member

twiecki commented Apr 19, 2017

@aseyboldt Do you find that convincing?

@ferrine I think we should change it to Empirical for now, definitely an improvement over Histogram.

@ferrine
Copy link
Member Author

ferrine commented Apr 19, 2017

Name change is going to be in #2027

@aseyboldt
Copy link
Member

@twiecki I still think it's a bit strange to call draws from a posterior "empirical", but I don't feel that strongly about it. If you're all fine with it, then go ahead.

@fonnesbeck
Copy link
Member

@aseyboldt I can see your point as well. What we have is analogous to a Dirichlet process, in that it represents some underlying distribution, but is constructed from a sample. What do we call that?

@junpenglao
Copy link
Member

I still cannot really wrap my head around the application, so in this case you fitted the model and got the estimated parameters and then used them to find the function minima? It seems to be quite different from the usually application of Bayesian Optimaization.
I think it would be great to have Bayesian Optimaization in PyMC3, a small wrapper taking the black box function and a pm.Model (e.g., a pymc3 GP model) as input, and find the function minima.

@ferrine
Copy link
Member Author

ferrine commented Jul 16, 2017

Yea. Here is a simple wrapper to optimize such objectives but I do not like my API

@junpenglao
Copy link
Member

@ferrine I think we can come back to this when the VI API is stabilised.

@ferrine
Copy link
Member Author

ferrine commented Jul 16, 2017

Sure

@springcoil
Copy link
Contributor

Any update on this now that opvi is stable?

@junpenglao
Copy link
Member

@springcoil we should wait after #2416 is merged. Also maybe it is better to extend this into a project of its own (like GPflowOpt)?

@ferrine
Copy link
Member Author

ferrine commented Aug 11, 2017

I like the idea. OPVI is wip again and comes with new cool features and speed as well. I don't plan to work on this pr this month.

@twiecki
Copy link
Member

twiecki commented Mar 20, 2018

@ferrine I'm closing this as it seem stale, feel free to reopen if you disagree.

@twiecki twiecki closed this Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants