Single-Site Random Walk Metropolis-Hastings
Random Walk Metropolis-Hastings is a simple, minimal MCMC inference method. Random Walk Metropolis-Hastings is single-site by default, following the philosophy of most inference methods in Bean Machine, and accordingly multi-site inference patterns are well supported. Random Walk Metropolis-Hastings follows the standard Metropolis-Hastings algorithm of sampling a value from a proposal distribution, and then running accept-reject according to the computed ratio of the proposed value. This is further detailed in the docs for Ancestral Metropolis-Hastings. This tutorial describes the proposal mechanism, describes adaptive Random Walk Metropolis-Hastings, and documents the API for the Random Walk Metropolis-Hastings algorithm.
Algorithmβ
Random Walk Metropolis-Hastings works on a single-site basis by proposing new values for a random variable that are close to the current value according to some sense of distance. As such, it is only defined for continuous random variables. The exact distance that a proposed value is from the current value is defined by the proposal distribution, and is a parameter that can be provided when configuring the inference method. For discrete random variables, a similar effect may be achieved, but custom proposers must be used instead.
The Random Walk Metropolis-Hastings algorithm has multiple proposers defined on different spaces such as all real numbers, positive real numbers, or intervals of the real numbers. These proposers all have common properties used to propose a new value from a current value . The proposal distribution is constructed to satisfy the following properties:
is the parameter that may be provided as a parameter when configuring the inference method, and it must be a fixed positive number. Larger values of will cause the inference method to explore more non-local values for . This may be good for faster exploration of the posterior, but it may cause lower probability values to get proposed (and therefore rejected) as a result.
Adaptive Random Walk Metropolis-Hastingsβ
Selecting a good value is important for efficient posterior exploration. However, it is often challenging for a user to select a good value, as it requires a nuanced understanding of the posterior space. Consequently, Bean Machine provides an adaptive version of Random Walk Metropolis-Hastings, in which the inference engine automatically tunes the value of during the first few samples of inference (known as the adaptation period).
The Random Walk Metropolis-Hastings algorithm is an exemplar use of the Bean Machine pattern for Adaptive inference, and this is enabled by using the argument num_adaptive_samples
in the call to infer()
. This causes Bean Machine to run an adaptation phase at the beginning of inference for the provided number of samples. During this phase, Bean Machine will internally tweak values of in order to find the largest value that still results in a relatively low number of rejected proposals. Technically speaking, Random Walk adaptation will attempt to achieve an amortized acceptance rate of 0.234. How this value is chosen as the optimal acceptance rate is detailed in Optional Scaling and Adaptive Markov Chain Monte Carlo.
Please note that samples taken during adaptation are not valid posterior samples, and so will not be shown by default when using the MonteCarloSamples
object returned from inference.
Usageβ
The following code snippet illustrates how to use the inference method. Here, step_size
represents from the algorithm above.
samples = bm.SingleSiteRandomWalk(
step_size = 2.0,
).infer(
queries,
observations,
num_samples,
num_adaptive_samples = 500,
)
The parameters to infer
are described below:
Name | Usage |
---|---|
queries | A List of @bm.random_variable targets to fit posterior distributions for. |
observations | The Dict of observations. Each key is a random variable, and its value is the observed value for that random variable. |
num_samples | Number of samples to build up distributions for the values listed in queries . |
num_chains | Number of separate inference runs to use. Multiple chains can be used by diagnostics to verify inference ran correctly. |