-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
boost minibatches #2171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
boost minibatches #2171
Conversation
@ferrine is this good from your end? |
I need tests for minibatch class |
Would be great to convert the notebooks that use the old minibatch interface over to this one as part of the PR. |
Once tests pass I'll cherry pick #2097 |
So, does this replace |
Yes. This implementation is much more faster in runtime. |
|
Previously I was suggested to change OOP inference to pm.fit. There is OOP inference in quick start api. I see it suitable for VI notebooks. So I am a bit confused why should we hide it in examples. It is very handy. |
@ferrine it's a matter of having consistent primary interfaces across PyMC3. I'm fine with having convenient methods for advanced users, but I think we need to be showing the same approach to fitting models among the suite of notebooks that we maintain. If we think OO is the way to go, we can have that discussion and change it across the board, including for the MCMC classes. That said, I'm not opposed to having a notebook of advanced features that could show off a comprehensive example that uses all the handy shortcuts that you have made available. |
Hm, seems like very comprehensive |
pymc3/data.py
Outdated
the same thing would be but less convenient | ||
>>> x.shared.set_value(pm.floatX(np.random.laplace(size=(100, 100)))) | ||
|
||
programmatic way to change storage is as following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as follows
pymc3/data.py
Outdated
if we want 1d slice of size 10 we do | ||
>>> x = Minibatch(data, batch_size=10) | ||
|
||
Note, that your data is casted to `floatX` if it is not integer type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is cast
pymc3/data.py
Outdated
>>> x = Minibatch(datagen(), batch_size=100, update_shared_f=datagen) | ||
>>> x.update_shared() | ||
|
||
To be more precise of how we get minibatch, here is a demo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more concrete about how we get ...
pymc3/data.py
Outdated
That's done. So if you'll need some replacements in the graph | ||
>>> testdata = pm.floatX(np.random.laplace(size=(1000, 10))) | ||
|
||
you are free to use a kind of this one as `x` is regular Theano Tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't quite understand this sentence)
pymc3/data.py
Outdated
>>> moredata = np.random.rand(10, 20, 30, 40, 50) | ||
|
||
default total_size is then (10, 20, 30, 40, 50) but | ||
can be less verbose in sove cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some cases
The down side to having the batches as tensors is that we cannot iterate over them, which is sometimes necessary. You can't even pass them to |
@fonnesbeck thanks for reviewing this:) |
I think that's done after rebase, feel free to leave feedback about refactored notebooks |
Need review |
In convolutional_vae_keras_advi.ipynb |
LDA notebook explains that "is automativally exponentiated (thus bounded to be positive) in advi_minibatch(), the estimation function.". advi_minibatch() should be removed, or readers might be confused. |
Did you also update |
Oh I see there is |
|
||
def __next__(self): | ||
idx = (self.rng | ||
.uniform(size=self.n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't use randint()
because Theano doesn't support randint()
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here I've left it just for test purposes. I use numpy for this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a reasonable option wrt consistency between numpy and theano.
I had a close look at this PR and I believe it's ready for merge. Although some tests failed but those doesn't seem to be related to the PR. Actually, |
I'm getting dimension mismatch with the following model: data = np.random.rand(101, 202)
data_t = pm.Minibatch(data, [10,11])
with pm.Model() as model:
U = pm.Normal('U', 0, 1, shape=(101, 3))
V = pm.Normal('V', 0, 1, shape=(202, 3))
reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
total_size=data.shape)
but it works when I replace |
@pwl I think the correct usage goes something like this? data = np.random.rand(101, 202)
data_t = pm.Minibatch(data, [10,11])
with pm.Model() as model:
U = pm.Normal('U', 0, 1, shape=(10, 3), total_size=(101, 3))
V = pm.Normal('V', 0, 1, shape=(11, 3), total_size=(202, 3))
reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
total_size=data.shape) |
@junpenglao thanks! I was just testing that, it seems that it works. How can I extract the full extent of EDIT: just to clarify, I'm doing data = np.random.rand(101, 202)
data_t = pm.Minibatch(data, [10,11])
with pm.Model() as model:
U = pm.Normal('U', 0, 1, shape=(10, 3), total_size=(101, 3))
V = pm.Normal('V', 0, 1, shape=(11, 3), total_size=(202, 3))
reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
total_size=data.shape)
advi = pm.ADVI()
approx = advi.fit(1)
trace = approx.sample(1)
trace['U'].shape # gives (1, 10, 3) |
Ok, I think I misunderstood the interface and somehow ignored the need to add indices by hand as in n_i = 101
n_j = 202
data = np.random.rand(n_i, n_j)
data_t = pm.Minibatch(data, [10,11])
data_i_idx = pm.Minibatch(range(n_i),10)
data_j_idx = pm.Minibatch(range(n_j),11)
with pm.Model() as model:
U = pm.Normal('U', 0, 1, shape=(n_i, 3))[data_i_idx,:]
V = pm.Normal('V', 0, 1, shape=(n_j, 3))[data_j_idx,:]
reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
total_size=data.shape)
advi = pm.ADVI()
approx = advi.fit(10000)
trace = approx.sample(100)
trace['U'].shape # works fine now: (100, 101, 3) I had an impression that the indexing will be done automatically under the hood. |
@ferrine thanks for this great PR, this really simplifies and unifies the minibatch training! |
@pwl I would also recommend setting different random seeds on different dimensions. Orelse you get correlated observations. You can find reference in the docstring |
@junpenglao I haven't yet played with multidim training too. But for 1d I found this very convenient |
@ferrine We should extend the probabilistic matrix factorization example to demonstrate the multi-dimensional mini batch training. |
Yes that is a good idea |
@ferrine thanks for the minibatch implementation -- great PR. I am not quite familiar with the guts of PyMC3 yet, but looking at @pwl's matrix factorization model, I wonder whether there is a guarantee that |
Hi, |
Thank you very much for your comment @ferrine. I wonder, perhaps it would be even more error proof if it was possible to extract the slicing indices from the data minibatch directly. @pwl's solution involving multiple minibatches on linear ranges for each dimension makes room for strange bugs down the road (what if a future version of theano decides to evaluate some tensors twice?) Is there an elegant solution to having a tuple of integer 1D tensors inside the minibatch class, one for each dimension, that would always be in sync with the slice |
Having that on instance level is not a solution since you have different minibatches for different tensors. Class level solution is needed here. I see it is not easy to implement, and as you have a workaround it does not worth at all. |
support multidimentional subsampling Possible extensions of total_size #2125
add minibatch inteface API proposal: container or data class #2056
tests
notebooks