-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Optimization #1953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization #1953
Conversation
@twiecki @fonnesbeck I think I need a review. Do we need bayesian optimization in PyMC3? |
Just a comment - and adding my two cents here. I've got no moral objection to having Bayesian Optimization in PyMC3, if you're willing to maintain the code. I'll give it a quick review now. |
I think this could be useful for fitting GP models that include discrete parameters, so I'm in favor. |
pymc3/optimization.py
Outdated
|
||
Parameters | ||
---------- | ||
kwargs : kwargs for theano.function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this could do with a bit more fleshing out in the docstring I think. Maybe with an example.
pymc3/optimization.py
Outdated
|
||
Parameters | ||
---------- | ||
kwargs : kwargs for theano.function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this could do with a bit more fleshing out in the docstring I think. Maybe with an example.
Does anyone know any good papers to read on this? Would one of the application areas be something like Bandit problems? |
@springcoil You can see this message from PR thread, I've described some applications |
So this replaces sample_ppc? |
@twiecki how does it replace? |
pymc3/optimization.py
Outdated
def fit(self, n=5000, callbacks=()): | ||
""" | ||
Perform optimization steps | ||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs new-line I think before Parameters.
@ferrine the notebook needs way more descriptions, it's not clear to me at all. what are you trying to optimize, where could this be useful etc. For example: |
This could use some actual documentation (not just docstrings), that is, adding to the files in See #1968 |
Yeah I'd like to encourage actual documentation for this as well. |
I've updated notebook. Hope it reveals the purpose of this tool better |
@ferrine Much better, this is really neat.
|
Okay, I'll take it all in account and change notebook soon |
Done |
@ferrine The docs are really enlightening. I wonder if |
pymc3/optimization.py
Outdated
class Optimizer(object): | ||
""" | ||
Optimization with posterior replacements | ||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also new-line.
@twiecki I think |
Maybe, but Another question is if we could marry this with Trace somehow? |
@twiecki what about |
It is married with trace already, or what do you mean? |
I mean to merge the functionality of Histogram into Trace. I don't necessarily think it's a good idea, but there are definitely parallels. As I understand it, Histogram = Trace + computational capabilities. |
We support different backends in Trace I'm not sure all of them are possible in this paradigm. That's why I only use common Trace interface to initialize Histogram and immediately forget about it |
I still think we should call it something with Posterior, because that's what the computations happen on. |
Maybe |
I see, what does Approximation do again? That's the base-class for VI? |
That's the base class for VI results. It can perform sampling |
I think it's time we think a bit deeper about the structure naming of |
@ferrine I looked at the code again and |
Very interesting! If this is for Bayesian optimization, why is there so much variational specific stuff? You do Bayesian optimization with other methods. Oh, I see this is why Thomas is talking about merging the MCMC and variational representations. What exactly is Histogram's role? If I'm reading this correctly, is the symbolic representation of the posterior (in this case stored as samples). You need that so that you can evaluate the model at specific points to compute the expected loss. For bayesian optimization are you always going to go through sampling? My guess is yes. If so, then you just need a common representation for a posterior distribution that can be used to generate samples. It should be a super simple representation. And then you need various inference methods that can return that representation. |
I found the notebook a bit confusing, though I think I figured it out. Definitely call I would put the section on optimization before the section showing that you can do regular inference (especially since that part seems to be standard variational inference?). I would also try to make the Optimization section a bit clearer. For example I would explicitly call You should also draw the minimum you found on a graph with the data and such. |
I don't like the name |
Thoughts on I would prefer a stateless function rather than an object. State is generally bad. Makes things hard to reason about. The history storing callback should definitely be inside of Optimizer should probably also return the loss minimizing x, no? I want to support random search as optimizer because hyperparameter optimization often proceeds that way. |
@aseyboldt
What about renaming |
|
I'm on board with We can talk about API changes on the next (first!) monthly PyMC video call. |
When's the monthly call? |
How about `ApproximatePosterior`or `PosteriorApproximation` then? I think the word Posterior should be in there somehow. I have to admit it took me an (embarrassingly) long time to understand what this whole module is about :-)
Empirical
based on, concerned with, or verifiable by observation or experience rather than theory or pure logic.
How is a trace empirical? It's got nothing to do with observation or experience. The similarity to an empirical distribution (a distribution that is based on measurements) is really superficial.
I kind of like `Sampled`, but I'd prefer histogram over empirical.
…On April 15, 2017 6:55:17 PM CEST, Maxim Kochurov ***@***.***> wrote:
@aseyboldt `Posterior` is a bad name for base class for several
reasons.
1. It is not actually true posterior
2. child classes names are not consistent
compare `class MeanField(Posterior)` with `class
MeanField(Approximation)`
> The child classes would then represent different (approximate) ways
to specify a posterior
If they are approximations they should be associated with
approximations, not true posterior that we never know
3.
> rename the module to posterior.
maybe it well be needed in future, when we decide to make unified
inference result
@jsalvatier
1. > then you just need a common representation
In my dreams, pymc3 API will be changed a lot
2. Comments about Jupiter notebook, when I return to this PR I'll bake
it better
3. > State is generally bad
Not always, it is better for fine tuning, monitoiring and debugging. If
you are not desired with result there is no need to recompile step
functions etc
What about renaming `Histogram`, I considered `Histogram->Empirical`
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1953 (comment)
-Adrian
|
@aseyboldt its empirical in the sense that it is constructed from samples, rather than from a parametric function. i.e. the observed outcomes from a sampling procedure. |
@aseyboldt Do you find that convincing? @ferrine I think we should change it to Empirical for now, definitely an improvement over Histogram. |
Name change is going to be in #2027 |
@twiecki I still think it's a bit strange to call draws from a posterior "empirical", but I don't feel that strongly about it. If you're all fine with it, then go ahead. |
@aseyboldt I can see your point as well. What we have is analogous to a Dirichlet process, in that it represents some underlying distribution, but is constructed from a sample. What do we call that? |
I still cannot really wrap my head around the application, so in this case you fitted the model and got the estimated parameters and then used them to find the function minima? It seems to be quite different from the usually application of Bayesian Optimaization. |
Yea. Here is a simple wrapper to optimize such objectives but I do not like my API |
@ferrine I think we can come back to this when the VI API is stabilised. |
Sure |
Any update on this now that opvi is stable? |
@springcoil we should wait after #2416 is merged. Also maybe it is better to extend this into a project of its own (like GPflowOpt)? |
I like the idea. OPVI is wip again and comes with new cool features and speed as well. I don't plan to work on this pr this month. |
@ferrine I'm closing this as it seem stale, feel free to reopen if you disagree. |
Bayesian optimization over posterior latent space is an interesting sort of problem that becomes real with this PR. Notebook with toy example is provided
@fonnesbeck you can find Histogram application for SVGD there