The No-U-Turn Sampler (NUTS) (Hoffman and Gelman, 2014) algorithm is an inference algorithm for differentiable random variables which uses Hamiltonian dynamics. It is an extension to the Hamiltonian Monte Carlo (HMC) inference algorithm.
If you haven't already read the docs on Hamiltonian Monte Carlo, please read those first.
The goal for NUTS is to automate the selection of an appropriate path length for HMC inference. It extends HMC by allowing the simulation of steps backwards in time during the leapfrog step. It also uses a smart simulation algorithm that can choose a path length heuristically, so that the proposed value tends to have a low correlation with the current value.
NUTS dynamically determines when the path starts looping backwards. In combination with the improvements from Adaptive HMC, this allow Bean Machine to automatically find the best step size and path length without requiring any user-tuned parameters.
NUTS decides on an optimal path length by building a binary tree where each path through the binary tree represents the trajectory of a sample. Each node at depth represents simulating steps forwards or backwards. This binary tree is adaptively grown until either hitting a pre-specified max depth size, or until the algorithm starts proposing samples with too low probabilities due to discretization errors.
The full NUTS algorithm description is quite involved. We recommend you check out Hoffman & Gelman, 2011 to learn more.
Bean Machine provides a single-site version of NUTS that only updates one variable at a time as well as a multi-site version of NUTS that updates all the variables in your model jointly at each step. Both follow the same API:
Functorch's ahead of time (AOT) autograd compiler is used
by default. If working with a non-static model or unexpected errors are encountered, you may need to manually
GlobalNoUTurnSampler has all the acceptance step size, covariance matrix, and acceptance probability tuning arguments of
GlobalHamiltonianMonteCarlo as well as a few more parameters related to tuning the path length. While there are many optional parameters for this inference method, in practice, the parameters you are most likely to modify are
max_tree_depth. When dealing with posteriors where the probability density has a more complicated shape, we benefit from taking smaller steps. Setting
target_accept_prob to a higher value like
0.9 will lead to a more careful exploration of the space using smaller step sizes while still benefiting from some tuning of that step size. Since we will be taking smaller steps, we need to compensate by having a larger path length. This is accomplished by increasing
max_tree_depth. Otherwise, using the defaults provided is highly recommended.
A more complete explanation of parameters to
GlobalNoUTurnSampler are provided below and in the docs:
|The maximum depth of the binary tree used to simulate leapfrog steps forwards and backwards in time.
|This is the lowest probability moves that NUTS will consider. Once most new samples have a lower probability, NUTS will stop its leapfrog steps. This should be interpreted as a negative log probability and is designed to be fairly conservative.
|The initial step size used in adaptive HMC. This value is simply the step size if tuning is disabled.
|Lets us decide between a faster multinomial sampler for the trajectory or the slice sampler described in the original paper. The option is useful for fairly comparing against other NUTS implementations.
|Indicates the acceptance probability which should be targeted by the step size tuning algorithm. While the optimal value is 65.1%, higher values have been show to be more robust leading to a default of 0.8.
|NNC (neural network compiler) is a Pytorch JIT compiler that that transforms Pytorch programs to LLVM-compiled binaries. The model support is currently limited, so if your model fails, consider filing an issue and turning this flag off.
The parameters to
infer are described below:
@bm.random_variable targets to fit posterior distributions for.
Dict of observations. Each key is a random variable, and its value is the observed value for that random variable.
|Number of samples to build up distributions for the values listed in
|Number of separate inference runs to use. Multiple chains can be used by diagnostics to verify inference ran correctly.
|Number of warmup samples to adapt the parameters.
Hoffman, Matthew D., and Andrew Gelman. "The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo." J. Mach. Learn. Res. 15.1 (2014): 1593-1623.