Skip to content

ADVI not properly sampling from Bounded distributions #1170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aasensio opened this issue Jun 9, 2016 · 20 comments
Closed

ADVI not properly sampling from Bounded distributions #1170

aasensio opened this issue Jun 9, 2016 · 20 comments

Comments

@aasensio
Copy link
Contributor

aasensio commented Jun 9, 2016

I'm trying to sample from a complex hierarchical model with lots of variables. I'm thinking of applying ADVI to sample from the posterior because the model has potentially many variables. One of the variables has a bounded prior. In the process of learning how to write user-defined bounded distributions, I did some tests with simple distributions and, unless I'm doing something wrong, it looks that ADVI is not able to properly sample from bounded distributions. Here is the code I'm using:

with pm.Model() as model:
    boundedNormal = pm.Bound(pm.Normal, upper=1.0)
    th = boundedNormal('th', mu=0.0, sd=1.0)
    v_params = pm.variational.advi(n=50000)
    traceADVI = pm.variational.sample_vp(v_params, draws=5000)

with pm.Model() as model:
    boundedNormal = pm.Bound(pm.Normal, upper=1.0)
    th = boundedNormal('th', mu=0.0, sd=1.0)
    traceNUTS = pm.sample(2000, pm.NUTS())

pm.traceplot(traceADVI)
pm.traceplot(traceNUTS)

The samples from the NUTS alg are shown in the following
figure_2

while those of ADVI are shown in the following:
figure_1-3

@twiecki
Copy link
Member

twiecki commented Jun 9, 2016

ADVI requires your model to live in an unconstrained space. We usually accomplish this via auto-transformations. It seems that Bound() does not add a transformed RV to the model (which maybe we should add as it's a bit confusing like this).

However, if you transformed this RV manually it should probably work. Just need to figure out how to transform :).

@aasensio
Copy link
Contributor Author

aasensio commented Jun 9, 2016

Yes, I'll do the transformation myself. However, the question remains of why NUTS takes this auto-transformation into account while ADVI does not. I'll have a look at the code and try to understand why this happens

@fonnesbeck
Copy link
Member

As with MCMC, we should not be reporting auto-transformed variables to users at all, and ADVI needs to report back-transformed variables (and all named deterministics, for that matter). Hopefully #1168 will help the first part of this).

@taku-y
Copy link
Contributor

taku-y commented Jun 9, 2016

I have implemented sample_vp(). To fix this function, I need to know how Bound() works internally, specifically, the way for keeping RVs with Bound() and for construction of the theano graph. In the current implementation, sample_vp() can detect automatic transformed RVs (e.g., Dirichlet). I think that the detection (and back-transformation) of RVs transformed with Bound() requires a different way. I'm busy this weekend, so I will work on this problem next week.

@twiecki
Copy link
Member

twiecki commented Jun 10, 2016

@taku-y if Bound() were auto-transforming this would be no problem though, right?

@twiecki
Copy link
Member

twiecki commented Jun 10, 2016

@fonnesbeck The problem is that the model really only cares about the transformed variables. without a mechanism where we supply untransformed values which then get transformed to the real line I think we would just keep plastering over things.

@fonnesbeck
Copy link
Member

I realize that the model needs to see different things than the user. Its more a matter of what is presented to the user, versus what gets used by the algorithm. We should be able to have these two interfaces to the same object.

@taku-y
Copy link
Contributor

taku-y commented Jun 10, 2016

I thought Bound() was a transformation, but actually it works as a discontinuous operation such that logp returns -inf out of the specified range of the value. So it should not be used with ADVI. I tested the original code of @aasensio and ELBO values were -inf during optimization.

@aasensio I think pymc3.distributions.transforms.interval(lower, upper) solves the original problem. This transforms real coordinate space into a bounded space.

@twiecki Bound() is a discontinuous operation. If possible, ADVI should check such operations before execution. But I can't think of how to implement this.

@aasensio
Copy link
Contributor Author

@taku-y @twiecki Would it be enough then if Bound() just adds the appropriate transformation to the distribution class? I'm doing some tests and having problems with the testval (probably because I still don't really understand the code).
My feeling is that it would be more efficient both for ADVI and from the sampler points of view to transform the variables, instead of discarding samples when the values are outside the boundaries. This would heavily increase the efficiency, I guess.

@taku-y
Copy link
Contributor

taku-y commented Jun 14, 2016

@aasensio I made an example notebook (https://gist.github.com/taku-y/aa4503452b5573c458c8c3fc32af0919). Does this result make sense?

In the notebook, MyBound class is defined, which supports transformation. I think this class should be added to the library.

As you said, the sampler would work more efficiently with transformations defined in MyBound, i.e., without rejection in the Metropolis-Hastings procedure, though I did not test that.

@aasensio
Copy link
Contributor Author

@taku-y It looks like this is the way to go. I did exactly as you did in your example notebook, but this is not always safe. In particular, it is unsafe if the testval of the distribution is not inside the ranges of the interval. For instance, it does not work if you bound the distribution to the interval [0,1]. Any idea how to solve it?

@taku-y
Copy link
Contributor

taku-y commented Jun 14, 2016

@aasensio Do you think of initialization of the random variable? I think we can check the testval and replace it a value within the range. It seems to be easily solved, but I might misunderstand what you said.

@aasensio
Copy link
Contributor Author

Your code works if a suitable testval is passed to the distribution. @twiecki @fonnesbeck I don't know if this testval should be tested to be inside the domain of the bounded distribution or you want to let it be defined by the user.

@twiecki
Copy link
Member

twiecki commented Jun 20, 2016

@taku-y That looks great. Do you think that could replace the current Bound implementation? Want to do a PR?

@aasensio I think it makes sense to test that the testval is inside the support.

@aasensio
Copy link
Contributor Author

I agree that the Bound's implementation of @taku-y could replace the current one. However, I would add a couple more transformations to deal with cases in which only the upper or only the lower bounds are provided. I would suggest something in the line of

class LowerBound(ElemwiseTransform):
    """Transform from real line interval [a,inf] to whole real line."""

    name = "lowerbound"

    def __init__(self, a):
        self.a = a

    def backward(self, x):
        a = self.a
        r = tt.exp(x) + a
        return r

    def forward(self, x):
        a = self.a
        r = tt.log(x - a)
        return r

lowerbound = LowerBound


class UpperBound(ElemwiseTransform):
    """Transform from real line interval [-inf,b] to whole real line."""

    name = "upperbound"

    def __init__(self, b):
        self.b = b

    def backward(self, x):
        b = self.b
        r = b - tt.exp(x)
        return r

    def forward(self, x):
        b = self.b
        r = tt.log(b - x)
        return r

upperbound = UpperBound

and then

class Bounded(Continuous):
    """A bounded distribution."""
    def __init__(self, distribution, lower, upper, *args, **kwargs):
        self.dist = distribution.dist(*args, **kwargs)

        self.__dict__.update(self.dist.__dict__)
        self.__dict__.update(locals())

        if hasattr(self.dist, 'mode'):
            self.mode = self.dist.mode

        if not np.isinf(lower) and not np.isinf(upper):
            self.transform = transforms.interval(lower, upper)

        if not np.isinf(lower) and np.isinf(upper):
            self.transform = transforms.lowerbound(lower)

        if np.isinf(lower) and not np.isinf(upper):
            self.transform = transforms.upperbound(upper)

    def _random(self, lower, upper, point=None, size=None):
        samples = np.zeros(size).flatten()
        i, n = 0, len(samples)
        while i < len(samples):
            sample = self.dist.random(point=point, size=n)
            select = sample[np.logical_and(sample > lower, sample <= upper)]
            samples[i:(i+len(select))] = select[:]
            i += len(select)
            n -= len(select)
        if size is not None:
            return np.reshape(samples, size)
        else:
            return samples

    def random(self, point=None, size=None, repeat=None):
        lower, upper = draw_values([self.lower, self.upper], point=point)
        return generate_samples(self._random, lower, upper, point,
                                dist_shape=self.shape,
                                size=size)

    def logp(self, value):
        return bound(self.dist.logp(value),
                     value >= self.lower, value <= self.upper)

Do you think it is a good solution? I've done some tests and it looks like it's working, provided you pass a testval that is inside the boundaries.

@twiecki
Copy link
Member

twiecki commented Jun 20, 2016

Yeah, that looks correct. I wonder if there is a way to integrate this better with the existing transformations in transforms.py (e.g. https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/transforms.py#L72) to avoid duplication.

@aasensio
Copy link
Contributor Author

It's true that Log is a particular case of LowerBound. Maybe it's enough to rename my LowerBoundwith Logand check inside whether the lower bound is zero or not. My impression is that LowerBoundand UpperBoundare somehow more self-explanatory, but it requires changing the transformations in several distributions. If I find some time during these days, I can do a PR with these changes.

@twiecki
Copy link
Member

twiecki commented Jun 20, 2016

@aasensio I agree. A PR would great. It's good enough if you consider the trade-offs and do the implementation you think is best.

@taku-y
Copy link
Contributor

taku-y commented Jun 20, 2016

@aasensio Your code looks good to me. PR would be nice.

@twiecki
Copy link
Member

twiecki commented Jul 8, 2016

Closed via #1205.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants