-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Submit BEST #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submit BEST #1517
Conversation
Really cool! Looks good to me. |
@@ -0,0 +1,141 @@ | |||
from ..distributions import StudentT, Exponential, Uniform, HalfCauchy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why call the dir modules
and not models
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because there's a models.py
present in the pymc3/
directory. I'm not sure what a better name would be, though. Do you have a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... It looks like I'm blind when coding at 3 am in the morning (in Vienna). I just realized it's model.py
, not models.py
, so I'll change the directory name to models
.
self.data['indices'] = self.data[self.sample_col].apply( | ||
lambda x: sample_names[x]) | ||
|
||
# print(self.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This is a great start. It warrants to think deeply about the API, choice of inference etc. But we might not be able to really specify it at this point so can move ahead here. One key trick is to use summary statistics for the test_vals to help with initialization. See e.g. https://github.com/quantopian/pyfolio/blob/master/pyfolio/bayesian.py#L194 |
@twiecki thanks for the feedback! On the API design: would it be worthwhile inserting a comment block at the top of the module to clarify how the API is currently designed and its rationale, and leaving a note for future contributors that they'd be free to modify the API if they see a better fit? Not sure what PyMC3's current practice is for this, so I'd love to get feedback. |
ping @twiecki: any comment on leaving a note for future contributors? |
Hi Eric, On Fri, Nov 11, 2016 at 12:59 PM, Eric Ma [email protected] wrote:
Peadar Coyle |
@springcoil all done. Thanks for the feedback! The final commit was just adding a docstring at the top of the BEST file, so all tests should pass. |
@ericmjl This looks great. The main thing is I think this should use NUTS, not ADVI. I have a PR that I think would fit well here, can you hold tight a bit? |
I want to implement some glm models too and we surely need base class to share essential functionality, see #1524. |
@ferrine I put a note outlining the API design on the top of the To repeat it here briefly, I thought first about how the data should be structured in a Pandas dataframe in order to make the code most concise, and then designed it around that. From my limited experience, statisticians like it best when data are provided structured and cleaned, with one row representing one "observation". Maybe that'd be a starting point? Actually, I'll move this to the other thread, we can discuss there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there is some work left. Tried to do my best in reviewing
mean_test = self.data.groupby('indices').mean()[self.output_col].values | ||
sd_test = self.data.groupby('indices').std()[self.output_col].values | ||
|
||
with Model() as model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds limitations to usage, I will not be possible to use model within a model. Am I right, @twiecki?
self.params = params | ||
self.model = model | ||
|
||
def plot_posterior(self, rotate_xticks=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also return axes for further user modifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, I'll update the PR with a fix for this.
self.data['indices'] = self.data[self.sample_col].apply( | ||
lambda x: sample_names[x]) | ||
|
||
def fit(self, n_steps=500000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible to custimize advifit:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to change this to use @twiecki's sample_init
function when it's merged in. I'll then wrap the kwargs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea, it is afterall what @twiecki's sample_init
function is designed for.
|
||
self._convert_to_indices() | ||
|
||
def _convert_to_indices(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to write static function without side effects. BTW it seems to be modifying original dataframe, that's bad for user
class O:
def __init__(self, data):
self.data = data
def meth(self):
self.data['a']=1
data = pd.DataFrame() # empty
o = O(data)
o.meth()
data # not empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the side effects here concern me.
import pandas as pd | ||
|
||
|
||
class BEST(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for base class we should consider using lasagne base ideas. I'll do a sketch for that soon
sample_names = dict() | ||
for i, name in enumerate( | ||
list(np.unique(self.data[self.sample_col].values))): | ||
print('Sample name {0} has the index {1}'.format(name, i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using the logging
facility for this so that it can be turned off (by default) and on as needed.
As slick as the sklearn-like API is, it is not consistent with other fitting methods on PyMC3, which are functions ( |
@ferrine: let's see how the discussion pans out here; I can see the reasons behind @fonnesbeck's concerns. If reception here looks good, I'm happy to work on it, otherwise, let's respect the leads and work on the idea in a separate repo instead. How does that sound? |
The sklearn API suggestion was only an idea, feel free to abandon it. |
Its not a bad idea at all, but my inclination is to keep the interface consistent across models, unless there is a good reason to deviate from it. If the sklearn API is something we like, then we should look for ways of moving to it across the package, and not just for new fitting methods. |
I agree with @fonnesbeck about consistency across the API, in terms of the interface across models. I'm a huge fan of the sklearn API but would consider something to be too different. |
I am just starting to join the discussion so please feel free to tell me if I'm either stating obvious or irrelevant things. I was looking for ways to build models in a modular fashion from simpler components (see #1553). I do like the approach here. In particular with respect to @fonnesbeck's comment, I see value in the object oriented ("lasagne") approach. I do not care that much where the fitting function is found, for consistency that could very well be the usual PyMC3 functions. But I do like the concept of an object that encapsulates related variables of a particular (sub-) model which can be fitted to data and linked to other models. I imagine that such a specialised model-class is a stand-in for a set of related random variables, while instances describe a particular binding of such related random variables to actual data and possibly external random variables. The object provides convenient access to these variables and possibly provide additional related functionality like domain specific insight in picking start-values. |
@ericmjl Just wondering if you are interested in refactoring this PR. |
@twiecki yep, still interested, just lacking time right now. If you need to clean up PRs that are not active, I'm cool closing it for now. |
No that's fine, just wondering if it was still active. |
Wondering @ericmjl if you've time to get this over the line? |
@springcoil I think it's low on my priority list right now - thesis defence, job hunt, prep for PyCon. I'll go ahead and close. |
Referencing #1502, here’s the PR for it. Have provided:
.fit()
and.plot_posterior()
functions