Skip to content

Rethinking AdvancedVI #24

@theogf

Description

@theogf

Alright! It's time to seriously take care of AdvancedVI :D

Here are some of the things we talked about in the meeting back in October:

  • There should be two distinct methods of optimization when the variational distribution is given as a function (like update_q) or a distribution from which the parameters change.
  • Hyperparameter optimization should be nicely implemented, a proposition was :
    makelogπ(logπ, ::Nothing) = logπ
    makelogπ(logπ, hyperparams) = logπ(hyperparams)
    function vi(..., logπ; hyperparams = nothing)
        ...
        while not_converged
            logjoint = makelogπ(logπ, hyperparams)
            for i in 1:n_inner
                ...
            end
        end
    end
  • We should condensate the updates on the variational parameters via a more "atomic" step! function

And here are some more personal points (disclaimer: I will be happy to take care of these different points)

  • I don't think the current ELBO approach is good, the ELBO can always be splitted between an entropy term (depending only of the distribution) and an expectation term over the log joint. Most VI methods take advantage of this by computing the entropy gradient analytically (and smartly!), see "Doubly Stochastic Variational Inference" by Titias for instance. My proposition would be to split the gradient into two parts (grad_entropy + grad_expeclog), where one can specialize given the problem.
  • I would personally argue that update_q only makes sense with the current obsolete implementation using distributions with immutable fields like TuringMvNormal. See again Titsias using the reparametrization trick.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions