Skip to main content

No-U-Turn Sampler

The No-U-Turn Sampler (NUTS) (Hoffman and Gelman, 2014) algorithm is an inference algorithm for differentiable random variables which uses Hamiltonian dynamics. It is an extension to the Hamiltonian Monte Carlo (HMC) inference algorithm.

tip

If you haven't already read the docs on Hamiltonian Monte Carlo, please read those first.

Algorithm​

The goal for NUTS is to automate the selection of an appropriate path length λ\lambda for HMC inference. It extends HMC by allowing the simulation of steps backwards in time during the leapfrog step. It also uses a smart simulation algorithm that can choose a path length heuristically, so that the proposed value tends to have a low correlation with the current value.

NUTS dynamically determines when the path starts looping backwards. In combination with the improvements from Adaptive HMC, this allow Bean Machine to automatically find the best step size and path length without requiring any user-tuned parameters.

NUTS decides on an optimal path length by building a binary tree where each path through the binary tree represents the trajectory of a sample. Each node at depth jj represents simulating 2j2^j steps forwards or backwards. This binary tree is adaptively grown until either hitting a pre-specified max depth size, or until the algorithm starts proposing samples with too low probabilities due to discretization errors.

The full NUTS algorithm description is quite involved. We recommend you check out Hoffman & Gelman, 2011 to learn more.

caution

As with HMC, NUTS operates on continuous latent variables only. For discrete variables, use CompositionalInference or marginalize them out as in the Zero inflated count data tutorial.

Usage​

Bean Machine provides a single-site version of NUTS that only updates one variable at a time as well as a multi-site version of NUTS that updates all the variables in your model jointly at each step. Both follow the same API:

bm.SingleSiteNoUTurnSampler(
max_tree_depth=10,
max_delta_energy=1000.0,
initial_step_size=1.0,
adapt_step_size=True,
adapt_mass_matrix=True,
multinomial_sampling=True,
target_accept_prob=0.8,
).infer(
queries,
observations,
num_samples,
num_chains,
num_adaptive_samples=1000,
)

bm.GlobalNoUTurnSampler(
max_tree_depth=10,
max_delta_energy=1000.0,
initial_step_size=1.0,
adapt_step_size=True,
adapt_mass_matrix=True,
multinomial_sampling=True,
target_accept_prob=0.8,
).infer(
queries,
observations,
num_samples,
num_chains,
num_adaptive_samples=1000,
)
caution

Functorch's ahead of time (AOT) autograd compiler is used by default. If working with a non-static model or unexpected errors are encountered, you may need to manually disable the nnc_compile flag.

The GlobalNoUTurnSampler has all the acceptance step size, covariance matrix, and acceptance probability tuning arguments of GlobalHamiltonianMonteCarlo as well as a few more parameters related to tuning the path length. While there are many optional parameters for this inference method, in practice, the parameters you are most likely to modify are target_accept_prob and max_tree_depth. When dealing with posteriors where the probability density has a more complicated shape, we benefit from taking smaller steps. Setting target_accept_prob to a higher value like 0.9 will lead to a more careful exploration of the space using smaller step sizes while still benefiting from some tuning of that step size. Since we will be taking smaller steps, we need to compensate by having a larger path length. This is accomplished by increasing max_tree_depth. Otherwise, using the defaults provided is highly recommended.

A more complete explanation of parameters to GlobalNoUTurnSampler are provided below and in the docs:

NameUsage
max_tree_depthThe maximum depth of the binary tree used to simulate leapfrog steps forwards and backwards in time.
max_delta_energyThis is the lowest probability moves that NUTS will consider. Once most new samples have a lower probability, NUTS will stop its leapfrog steps. This should be interpreted as a negative log probability and is designed to be fairly conservative.
initial_step_sizeThe initial step size ϵ\epsilon used in adaptive HMC. This value is simply the step size if tuning is disabled.
multinomial_samplingLets us decide between a faster multinomial sampler for the trajectory or the slice sampler described in the original paper. The option is useful for fairly comparing against other NUTS implementations.
target_accept_probIndicates the acceptance probability which should be targeted by the step size tuning algorithm. While the optimal value is 65.1%, higher values have been show to be more robust leading to a default of 0.8.
nnc_compileNNC (neural network compiler) is a Pytorch JIT compiler that that transforms Pytorch programs to LLVM-compiled binaries. The model support is currently limited, so if your model fails, consider filing an issue and turning this flag off.

The parameters to infer are described below:

NameUsage
queriesA List of @bm.random_variable targets to fit posterior distributions for.
observationsThe Dict of observations. Each key is a random variable, and its value is the observed value for that random variable.
num_samplesNumber of samples to build up distributions for the values listed in queries.
num_chainsNumber of separate inference runs to use. Multiple chains can be used by diagnostics to verify inference ran correctly.
num_adaptive_samplesNumber of warmup samples to adapt the parameters.

Hoffman, Matthew D., and Andrew Gelman. "The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo." J. Mach. Learn. Res. 15.1 (2014): 1593-1623.