Variational Inference

Params

A Param represents a variational parameter to be optimized during variational inference. Use @bm.param to decorate an "initialization fuction" which returns a tensor value to initialize the variational parameter at the start of optimization.

Variational Worlds

A VariationalWorld is a sub-class of World which also contains data on guide distributions and their parameters, specifically:

get_guide_distribution: given a RVIdentifier, returns its corresponding guide distribution
get_param: given a RVIdentifier for a Param, returns (possibly initializing if empty) the value of the parameter

Note: An implementation detail is that update_graph is overriden such that the guide distribution is automatically used if one is available.

Gradient Estimators and Divergences

A gradient_estimator computes a Monte-Carlo (possibly surrogate) objective estimate whose gradients are used as the training signal.

We structure our VI objective following abstractions introduced in f-Divergence Variational Inference, where gradient_estimator takes as input a discrepancy function corresponding to an $f$ -divergence.

VariationalInfer

The VariationalInfer class provides an entrypoint for VI. Model and guide RVIdentifiers are associated in the constructor's queries_to_guides argument and optimizater configuration is provided through a optimizer callback. An infer() method is provided for easy invocation whereas step() permits more customized interactions (e.g. tensorboard callbacks).

AutoGuides

Manually defining a guide for each random variable can become tedious. AutoGuideVI provides an initialization strategy for VariationalInfer which automatically defines guides through calling a method get_guide(query: RVIdentifier, distrib: dist.Distribution) implemented by subclasses.

All AutoGuides currently make a mean-field assumption over RVIdentifiers: $q(x) = \prod_{i \in \text{RVIDs}} q_i(x_i)$

ADVI

In Automatic Differentiation Variational Inference (ADVI), a properly-sized Gaussian is used as a guide to approximate each site: $q_i \sim \mathcal{N}(\mu_i, \sigma_i)$

MAP

In Maximum A Posteriori (MAP) inference, a Delta point estimate is used as the guide for each site: $q_i \sim \text{Delta}(\mu_i)$

Params​

Variational Worlds​

Gradient Estimators and Divergences​

VariationalInfer​

AutoGuides​

ADVI​

MAP​