Skip to content

boost minibatches #2171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jun 12, 2017
Merged

boost minibatches #2171

merged 36 commits into from
Jun 12, 2017

Conversation

ferrine
Copy link
Member

@ferrine ferrine commented May 10, 2017

@twiecki
Copy link
Member

twiecki commented May 12, 2017

@ferrine is this good from your end?

@ferrine
Copy link
Member Author

ferrine commented May 13, 2017

I need tests for minibatch class

@fonnesbeck
Copy link
Member

Would be great to convert the notebooks that use the old minibatch interface over to this one as part of the PR.

@ferrine
Copy link
Member Author

ferrine commented May 27, 2017

Once tests pass I'll cherry pick #2097

@fonnesbeck
Copy link
Member

So, does this replace DataSampler?

@ferrine
Copy link
Member Author

ferrine commented May 27, 2017

Yes. This implementation is much more faster in runtime.

@ferrine
Copy link
Member Author

ferrine commented May 27, 2017

DataSampler is used only in tests. I think we are free to remove it before #2223

@ferrine
Copy link
Member Author

ferrine commented May 28, 2017

Previously I was suggested to change OOP inference to pm.fit. There is OOP inference in quick start api. I see it suitable for VI notebooks. So I am a bit confused why should we hide it in examples. It is very handy.
Thus I see further work in introducing new minibatches. Any other thoughts?

@fonnesbeck
Copy link
Member

@ferrine it's a matter of having consistent primary interfaces across PyMC3. I'm fine with having convenient methods for advanced users, but I think we need to be showing the same approach to fitting models among the suite of notebooks that we maintain. If we think OO is the way to go, we can have that discussion and change it across the board, including for the MCMC classes.

That said, I'm not opposed to having a notebook of advanced features that could show off a comprehensive example that uses all the handy shortcuts that you have made available.

@ferrine
Copy link
Member Author

ferrine commented May 30, 2017

Hm, seems like very comprehensive __init__ and clone method can be together

pymc3/data.py Outdated
the same thing would be but less convenient
>>> x.shared.set_value(pm.floatX(np.random.laplace(size=(100, 100))))

programmatic way to change storage is as following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as follows

pymc3/data.py Outdated
if we want 1d slice of size 10 we do
>>> x = Minibatch(data, batch_size=10)

Note, that your data is casted to `floatX` if it is not integer type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cast

pymc3/data.py Outdated
>>> x = Minibatch(datagen(), batch_size=100, update_shared_f=datagen)
>>> x.update_shared()

To be more precise of how we get minibatch, here is a demo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more concrete about how we get ...

pymc3/data.py Outdated
That's done. So if you'll need some replacements in the graph
>>> testdata = pm.floatX(np.random.laplace(size=(1000, 10)))

you are free to use a kind of this one as `x` is regular Theano Tensor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't quite understand this sentence)

pymc3/data.py Outdated
>>> moredata = np.random.rand(10, 20, 30, 40, 50)

default total_size is then (10, 20, 30, 40, 50) but
can be less verbose in sove cases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some cases

@fonnesbeck
Copy link
Member

fonnesbeck commented Jun 1, 2017

The down side to having the batches as tensors is that we cannot iterate over them, which is sometimes necessary. You can't even pass them to scan as a sequence.

@ferrine
Copy link
Member Author

ferrine commented Jun 1, 2017

@fonnesbeck thanks for reviewing this:)
Case of iterating minibatches is not covered ofc. But scan is ok. You can do arbitrary theano operations over this minibatch. It is sampled once per graph execution

@ferrine
Copy link
Member Author

ferrine commented Jun 6, 2017

I think that's done after rebase, feel free to leave feedback about refactored notebooks

@ferrine
Copy link
Member Author

ferrine commented Jun 7, 2017

Need review

@taku-y
Copy link
Contributor

taku-y commented Jun 9, 2017

In convolutional_vae_keras_advi.ipynb from pymc3.variational import advi_minibatch is not used.

@taku-y
Copy link
Contributor

taku-y commented Jun 9, 2017

LDA notebook explains that "is automativally exponentiated (thus bounded to be positive) in advi_minibatch(), the estimation function.". advi_minibatch() should be removed, or readers might be confused.

@junpenglao
Copy link
Member

Did you also update test_advi.py? It's still testing the old advi api.

@junpenglao
Copy link
Member

Oh I see there is test_variational_inference.py for the new api


def __next__(self):
idx = (self.rng
.uniform(size=self.n,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't use randint() because Theano doesn't support randint(), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is it

Copy link
Member Author

@ferrine ferrine Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here I've left it just for test purposes. I use numpy for this test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a reasonable option wrt consistency between numpy and theano.

@taku-y
Copy link
Contributor

taku-y commented Jun 12, 2017

I had a close look at this PR and I believe it's ready for merge. Although some tests failed but those doesn't seem to be related to the PR. Actually, test_minibatches.py passed with 100%. I want to merge it unless I hear otherwise.

@taku-y taku-y merged commit e8da421 into pymc-devs:master Jun 12, 2017
@pwl
Copy link

pwl commented Jun 12, 2017

I'm getting dimension mismatch with the following model:

data = np.random.rand(101, 202)
data_t = pm.Minibatch(data, [10,11])
with pm.Model() as model:
    U = pm.Normal('U', 0, 1, shape=(101, 3))
    V = pm.Normal('V', 0, 1, shape=(202, 3))
    reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
                      total_size=data.shape)
ValueError: Input dimension mis-match. (input[0].shape[0] = 10, input[1].shape[0] = 101)

but it works when I replace data_t with data.

@junpenglao
Copy link
Member

@pwl I think the correct usage goes something like this?

data = np.random.rand(101, 202)
data_t = pm.Minibatch(data, [10,11])
with pm.Model() as model:
    U = pm.Normal('U', 0, 1, shape=(10, 3), total_size=(101, 3))
    V = pm.Normal('V', 0, 1, shape=(11, 3), total_size=(202, 3))
    reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
                      total_size=data.shape)

@pwl
Copy link

pwl commented Jun 12, 2017

@junpenglao thanks! I was just testing that, it seems that it works. How can I extract the full extent of U and V after fitting the model? sample is returning just the minibatch-sized chunk of U.

EDIT: just to clarify, I'm doing

data = np.random.rand(101, 202)
data_t = pm.Minibatch(data, [10,11])
with pm.Model() as model:
    U = pm.Normal('U', 0, 1, shape=(10, 3), total_size=(101, 3))
    V = pm.Normal('V', 0, 1, shape=(11, 3), total_size=(202, 3))
    reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
                      total_size=data.shape)
    advi = pm.ADVI()
    approx = advi.fit(1)
    trace = approx.sample(1)
trace['U'].shape # gives (1, 10, 3)

@junpenglao
Copy link
Member

@pwl I am not sure, this is a brand new feature - I havent got a chance to play with it yet. @ferrine?

@pwl
Copy link

pwl commented Jun 12, 2017

Ok, I think I misunderstood the interface and somehow ignored the need to add indices by hand as in

n_i = 101
n_j = 202
data = np.random.rand(n_i, n_j)
data_t = pm.Minibatch(data, [10,11])
data_i_idx = pm.Minibatch(range(n_i),10)
data_j_idx = pm.Minibatch(range(n_j),11)
with pm.Model() as model:
    U = pm.Normal('U', 0, 1, shape=(n_i, 3))[data_i_idx,:]
    V = pm.Normal('V', 0, 1, shape=(n_j, 3))[data_j_idx,:]
    reads = pm.Normal('reads', mu=pm.math.dot(U,V.T), observed=data_t,
                      total_size=data.shape)
    advi = pm.ADVI()
    approx = advi.fit(10000)
    trace = approx.sample(100)
trace['U'].shape # works fine now: (100, 101, 3)

I had an impression that the indexing will be done automatically under the hood.

@pwl
Copy link

pwl commented Jun 12, 2017

@ferrine thanks for this great PR, this really simplifies and unifies the minibatch training!

@junpenglao junpenglao mentioned this pull request Jun 12, 2017
@ferrine
Copy link
Member Author

ferrine commented Jun 12, 2017

@pwl I would also recommend setting different random seeds on different dimensions. Orelse you get correlated observations. You can find reference in the docstring

@ferrine ferrine deleted the boost_minibatches branch June 12, 2017 20:08
@ferrine
Copy link
Member Author

ferrine commented Jun 12, 2017

@junpenglao I haven't yet played with multidim training too. But for 1d I found this very convenient

@junpenglao
Copy link
Member

junpenglao commented Jun 12, 2017

@ferrine We should extend the probabilistic matrix factorization example to demonstrate the multi-dimensional mini batch training.

@ferrine
Copy link
Member Author

ferrine commented Jun 12, 2017

Yes that is a good idea

@mbabadi
Copy link

mbabadi commented Sep 19, 2017

@ferrine thanks for the minibatch implementation -- great PR. I am not quite familiar with the guts of PyMC3 yet, but looking at @pwl's matrix factorization model, I wonder whether there is a guarantee that data_t, data_i_idx and data_j_idx are evaluated only once during each round of gradient/ELBO evaluation? if one of them falls behind (or moves forward), the game is ruined.

@ferrine
Copy link
Member Author

ferrine commented Sep 22, 2017

Hi,
Random indices are evaluated once per function evaluation by design of theano. What about manual minibatch.eval() or independent usage within several functions that's the thing I don't know how to control. That is what user should take care of. Personally I prefer putting minibatches in inference as a replacement, less error prone.

@mbabadi
Copy link

mbabadi commented Sep 25, 2017

Thank you very much for your comment @ferrine.

I wonder, perhaps it would be even more error proof if it was possible to extract the slicing indices from the data minibatch directly. @pwl's solution involving multiple minibatches on linear ranges for each dimension makes room for strange bugs down the road (what if a future version of theano decides to evaluate some tensors twice?)

Is there an elegant solution to having a tuple of integer 1D tensors inside the minibatch class, one for each dimension, that would always be in sync with the slice minibatch.eval() would yield, and that the user could use/evaluate without the side effect of changing the state of the RNG?

@ferrine
Copy link
Member Author

ferrine commented Sep 25, 2017

Having that on instance level is not a solution since you have different minibatches for different tensors. Class level solution is needed here. I see it is not easy to implement, and as you have a workaround it does not worth at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants