Optimization #1953

ferrine · 2017-03-25T23:03:38Z

Bayesian optimization over posterior latent space is an interesting sort of problem that becomes real with this PR. Notebook with toy example is provided

@fonnesbeck you can find Histogram application for SVGD there

ferrine · 2017-03-26T16:58:37Z

@twiecki @fonnesbeck I think I need a review. Do we need bayesian optimization in PyMC3?

springcoil · 2017-03-26T19:12:48Z

Just a comment - and adding my two cents here. I've got no moral objection to having Bayesian Optimization in PyMC3, if you're willing to maintain the code. I'll give it a quick review now.

fonnesbeck · 2017-03-26T19:19:08Z

I think this could be useful for fitting GP models that include discrete parameters, so I'm in favor.

springcoil · 2017-03-26T19:14:27Z

pymc3/optimization.py

+
+        Parameters
+        ----------
+        kwargs : kwargs for theano.function


Hmm this could do with a bit more fleshing out in the docstring I think. Maybe with an example.

springcoil · 2017-03-26T19:15:42Z

pymc3/optimization.py

+
+        Parameters
+        ----------
+        kwargs : kwargs for theano.function


Hmm this could do with a bit more fleshing out in the docstring I think. Maybe with an example.

springcoil · 2017-03-26T19:31:47Z

Does anyone know any good papers to read on this? Would one of the application areas be something like Bandit problems?

ferrine · 2017-03-26T19:35:49Z

@springcoil You can see this message from PR thread, I've described some applications

twiecki · 2017-03-27T04:36:19Z

So this replaces sample_ppc?

ferrine · 2017-03-27T17:12:15Z

@twiecki how does it replace?

twiecki · 2017-03-29T15:33:51Z

pymc3/optimization.py

+    def fit(self, n=5000, callbacks=()):
+        """
+        Perform optimization steps
+        Parameters


Needs new-line I think before Parameters.

twiecki · 2017-03-29T15:39:10Z

@ferrine the notebook needs way more descriptions, it's not clear to me at all. what are you trying to optimize, where could this be useful etc. For example: opt = pm.Optimizer(approx, y.sum(), [x], optimizer=sgd) Why do you sum y?

fonnesbeck · 2017-03-29T15:58:26Z

This could use some actual documentation (not just docstrings), that is, adding to the files in pymc3/docs so that there is a guide to usage. At least open a PR for the docs, if not here.

See #1968

springcoil · 2017-04-07T09:47:22Z

Yeah I'd like to encourage actual documentation for this as well.

ferrine · 2017-04-07T18:09:44Z

I've updated notebook. Hope it reveals the purpose of this tool better

twiecki · 2017-04-07T18:41:37Z

@ferrine Much better, this is really neat.

I would describe what histogram.apply_replacements(y, deterministic=True).eval() does.
So Histogram is like a trace with special properties? Does it also work for ADVI? What about NUTS?
from itertools import zip_longest is not used.
y_ = abc[0]*x_**2 + abc[1]*x_ + abc[2] can't you use f() you defined above instead?
Quite a few typos: Condider, bayesian -> Bayesian

ferrine · 2017-04-07T21:00:25Z

Okay, I'll take it all in account and change notebook soon

ferrine · 2017-04-07T21:40:45Z

Done

twiecki · 2017-04-08T10:33:16Z

@ferrine The docs are really enlightening. I wonder if Histogram is the right name. What about Posterior?

twiecki · 2017-04-08T10:33:56Z

pymc3/optimization.py

+class Optimizer(object):
+    """
+    Optimization with posterior replacements
+    Parameters


Also new-line.

ferrine · 2017-04-08T13:55:18Z

@twiecki I think Posterior does not reveal the nature of that class that is just storing particles. It can confuse user and if trace is bad it's not Posterior even.

twiecki · 2017-04-08T14:26:00Z

Maybe, but Histogram is certainly confusing already. What about that makes it a histogram? PosteriorApprox, PosteriorSamples, PosteriorEst?

Another question is if we could marry this with Trace somehow?

ferrine · 2017-04-08T14:42:26Z

@twiecki what about Particles?

ferrine · 2017-04-08T14:45:43Z

It is married with trace already, or what do you mean?

twiecki · 2017-04-08T14:47:45Z

Particles isn't bad, but I don't like the co-notation with particle samplers.

I mean to merge the functionality of Histogram into Trace. I don't necessarily think it's a good idea, but there are definitely parallels. As I understand it, Histogram = Trace + computational capabilities.

ferrine · 2017-04-08T14:53:39Z

We support different backends in Trace I'm not sure all of them are possible in this paradigm. That's why I only use common Trace interface to initialize Histogram and immediately forget about it

twiecki · 2017-04-08T15:08:23Z

I still think we should call it something with Posterior, because that's what the computations happen on.

ferrine · 2017-04-08T15:19:43Z

Maybe Empirical? Empirical distribution is the one that consists of samples only. I do not like Posterior there as I inherit from Approximation and Posterior(Approximation) seems to be wrong name from that point of view. Empirical(Approximation) is be more intuitive

twiecki · 2017-04-08T15:26:53Z

I see, what does Approximation do again? That's the base-class for VI?

ferrine · 2017-04-08T15:28:37Z

That's the base class for VI results. It can perform sampling

twiecki · 2017-04-08T15:31:12Z

I think it's time we think a bit deeper about the structure naming of variational. Could you help by providing an overview of the classes and their purpose and inheritance structure?

twiecki · 2017-04-14T19:16:19Z

@ferrine I looked at the code again and Histogram is not the only class inheriting from Approximate, there's also MeanField and FullRank in which case inheritance makes sense. Which functionality does Histogram use from Approximate?

jsalvatier · 2017-04-14T23:22:31Z

Very interesting!

If this is for Bayesian optimization, why is there so much variational specific stuff? You do Bayesian optimization with other methods. Oh, I see this is why Thomas is talking about merging the MCMC and variational representations.

What exactly is Histogram's role? If I'm reading this correctly, is the symbolic representation of the posterior (in this case stored as samples). You need that so that you can evaluate the model at specific points to compute the expected loss.

For bayesian optimization are you always going to go through sampling? My guess is yes.

If so, then you just need a common representation for a posterior distribution that can be used to generate samples. It should be a super simple representation. And then you need various inference methods that can return that representation.

jsalvatier · 2017-04-14T23:31:03Z

I found the notebook a bit confusing, though I think I figured it out.

Definitely call histogram something else. Its hard to think about this way. Maybe posterior_samples or empirical_posterior?.

I would put the section on optimization before the section showing that you can do regular inference (especially since that part seems to be standard variational inference?).

I would also try to make the Optimization section a bit clearer. For example I would explicitly call y_ "loss" instead. It took me a while to think through what exactly y_ was and what we were doing to it.

You should also draw the minimum you found on a graph with the data and such.

aseyboldt · 2017-04-14T23:46:43Z

I don't like the name Approximation either. I think the problem is that the name tells us how something is stored (namely approximate) and not what it is.
Why not call Approximation Posterior then? The child classes would then represent different (approximate) ways to specify a posterior. Histogram could then be called Sampled (or maybe even Trace, but that might be confusing if we have the other trace type around), which would match the other names MeanField etc in that the name specifies how the posterior is specified. And probably also rename the module to posterior.
There wouldn't be a reason to put those in the global pymc3 namespace, is there?

jsalvatier · 2017-04-15T00:34:14Z

Thoughts on Optimizer:

I would prefer a stateless function rather than an object. State is generally bad. Makes things hard to reason about.

The history storing callback should definitely be inside of Optmizer.

Optimizer should probably also return the loss minimizing x, no?

I want to support random search as optimizer because hyperparameter optimization often proceeds that way.

ferrine · 2017-04-15T16:55:16Z

@aseyboldt Posterior is a bad name for base class for several reasons.

It is not actually true posterior
child classes names are not consistent
compare class MeanField(Posterior) with class MeanField(Approximation)

The child classes would then represent different (approximate) ways to specify a posterior

If they are approximations they should be associated with approximations, not true posterior that we never know
rename the module to posterior.

maybe it well be needed in future, when we decide to make unified inference result

@jsalvatier

then you just need a common representation

In my dreams, pymc3 API will be changed a lot
Comments about Jupiter notebook, when I return to this PR I'll bake it better
State is generally bad

Not always, it is better for fine tuning, monitoiring and debugging. If you are not desired with result there is no need to recompile step functions etc

What about renaming Histogram, I considered Histogram->Empirical

twiecki · 2017-04-15T17:13:52Z

Empirical seems to be the smallest common denominator. It's an empirical approximation from samples, which makes sense.

fonnesbeck · 2017-04-15T17:29:34Z

I'm on board with Empirical.

We can talk about API changes on the next (first!) monthly PyMC video call.

springcoil · 2017-04-15T17:38:48Z

When's the monthly call?

aseyboldt · 2017-04-15T18:35:33Z

How about `ApproximatePosterior`or `PosteriorApproximation` then? I think the word Posterior should be in there somehow. I have to admit it took me an (embarrassingly) long time to understand what this whole module is about :-) Empirical

based on, concerned with, or verifiable by observation or experience rather than theory or pure logic.

How is a trace empirical? It's got nothing to do with observation or experience. The similarity to an empirical distribution (a distribution that is based on measurements) is really superficial. I kind of like `Sampled`, but I'd prefer histogram over empirical.

…

On April 15, 2017 6:55:17 PM CEST, Maxim Kochurov ***@***.***> wrote: @aseyboldt `Posterior` is a bad name for base class for several reasons. 1. It is not actually true posterior 2. child classes names are not consistent compare `class MeanField(Posterior)` with `class MeanField(Approximation)` > The child classes would then represent different (approximate) ways to specify a posterior If they are approximations they should be associated with approximations, not true posterior that we never know 3. > rename the module to posterior. maybe it well be needed in future, when we decide to make unified inference result @jsalvatier 1. > then you just need a common representation In my dreams, pymc3 API will be changed a lot 2. Comments about Jupiter notebook, when I return to this PR I'll bake it better 3. > State is generally bad Not always, it is better for fine tuning, monitoiring and debugging. If you are not desired with result there is no need to recompile step functions etc What about renaming `Histogram`, I considered `Histogram->Empirical` -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #1953 (comment)

-Adrian

fonnesbeck · 2017-04-15T20:34:48Z

@aseyboldt its empirical in the sense that it is constructed from samples, rather than from a parametric function. i.e. the observed outcomes from a sampling procedure.

twiecki · 2017-04-19T09:50:40Z

@aseyboldt Do you find that convincing?

@ferrine I think we should change it to Empirical for now, definitely an improvement over Histogram.

ferrine · 2017-04-19T11:11:57Z

Name change is going to be in #2027

aseyboldt · 2017-04-19T13:10:53Z

@twiecki I still think it's a bit strange to call draws from a posterior "empirical", but I don't feel that strongly about it. If you're all fine with it, then go ahead.

fonnesbeck · 2017-04-19T13:14:59Z

@aseyboldt I can see your point as well. What we have is analogous to a Dirichlet process, in that it represents some underlying distribution, but is constructed from a sample. What do we call that?

junpenglao · 2017-07-16T08:17:53Z

I still cannot really wrap my head around the application, so in this case you fitted the model and got the estimated parameters and then used them to find the function minima? It seems to be quite different from the usually application of Bayesian Optimaization.
I think it would be great to have Bayesian Optimaization in PyMC3, a small wrapper taking the black box function and a pm.Model (e.g., a pymc3 GP model) as input, and find the function minima.

ferrine · 2017-07-16T09:32:04Z

Yea. Here is a simple wrapper to optimize such objectives but I do not like my API

junpenglao · 2017-07-16T09:41:00Z

@ferrine I think we can come back to this when the VI API is stabilised.

ferrine · 2017-07-16T12:34:46Z

Sure

springcoil · 2017-08-11T15:16:51Z

Any update on this now that opvi is stable?

junpenglao · 2017-08-11T15:19:31Z

@springcoil we should wait after #2416 is merged. Also maybe it is better to extend this into a project of its own (like GPflowOpt)?

ferrine · 2017-08-11T15:24:53Z

I like the idea. OPVI is wip again and comes with new cool features and speed as well. I don't plan to work on this pr this month.

twiecki · 2018-03-20T13:44:55Z

@ferrine I'm closing this as it seem stale, feel free to reopen if you disagree.

springcoil reviewed Mar 26, 2017

View reviewed changes

twiecki reviewed Mar 29, 2017

View reviewed changes

ferrine mentioned this pull request Apr 7, 2017

Add Histogram example and docs #1932

Closed

twiecki reviewed Apr 8, 2017

View reviewed changes

pymc3/optimization.py Outdated

class Optimizer(object):

"""

Optimization with posterior replacements

Parameters

Copy link

Member

twiecki Apr 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also new-line.

ferrine added 6 commits April 14, 2017 22:07

update notebook

d16b93b

add note to docstring

f852fbd

update notebook

4b2a669

delete unused line in notebook

5c9c2ef

update notebook, fix wrong docstring

8f071a9

add new line to docs

395cdc1

twiecki closed this Mar 20, 2018

Optimization #1953

Optimization #1953

Conversation

ferrine commented Mar 25, 2017

ferrine commented Mar 26, 2017

springcoil commented Mar 26, 2017

fonnesbeck commented Mar 26, 2017

springcoil Mar 26, 2017

Choose a reason for hiding this comment

springcoil Mar 26, 2017

Choose a reason for hiding this comment

springcoil commented Mar 26, 2017

ferrine commented Mar 26, 2017

twiecki commented Mar 27, 2017

ferrine commented Mar 27, 2017

twiecki Mar 29, 2017

Choose a reason for hiding this comment

twiecki commented Mar 29, 2017

fonnesbeck commented Mar 29, 2017 • edited Loading

springcoil commented Apr 7, 2017

ferrine commented Apr 7, 2017

twiecki commented Apr 7, 2017

ferrine commented Apr 7, 2017

ferrine commented Apr 7, 2017

twiecki commented Apr 8, 2017

twiecki Apr 8, 2017

Choose a reason for hiding this comment

ferrine commented Apr 8, 2017

twiecki commented Apr 8, 2017

ferrine commented Apr 8, 2017

ferrine commented Apr 8, 2017

twiecki commented Apr 8, 2017

ferrine commented Apr 8, 2017

twiecki commented Apr 8, 2017

ferrine commented Apr 8, 2017 • edited Loading

twiecki commented Apr 8, 2017

ferrine commented Apr 8, 2017

twiecki commented Apr 8, 2017

twiecki commented Apr 14, 2017

jsalvatier commented Apr 14, 2017 • edited Loading

jsalvatier commented Apr 14, 2017 • edited Loading

aseyboldt commented Apr 14, 2017

jsalvatier commented Apr 15, 2017

ferrine commented Apr 15, 2017

twiecki commented Apr 15, 2017

fonnesbeck commented Apr 15, 2017

springcoil commented Apr 15, 2017

aseyboldt commented Apr 15, 2017 via email

fonnesbeck commented Apr 15, 2017 • edited Loading

twiecki commented Apr 19, 2017

ferrine commented Apr 19, 2017

aseyboldt commented Apr 19, 2017

fonnesbeck commented Apr 19, 2017

junpenglao commented Jul 16, 2017

ferrine commented Jul 16, 2017

junpenglao commented Jul 16, 2017

ferrine commented Jul 16, 2017

springcoil commented Aug 11, 2017

junpenglao commented Aug 11, 2017

ferrine commented Aug 11, 2017

twiecki commented Mar 20, 2018

fonnesbeck commented Mar 29, 2017 •

edited

Loading

ferrine commented Apr 8, 2017 •

edited

Loading

jsalvatier commented Apr 14, 2017 •

edited

Loading

jsalvatier commented Apr 14, 2017 •

edited

Loading

fonnesbeck commented Apr 15, 2017 •

edited

Loading