Bean Machine is a probabilistic programming language that makes developing and deploying generative probabilistic models intuitive and efficient. This page describes the motivation for using Probabilistic Programming in general, and Bean Machine advantages specifically.
Bean Machine's generative modeling is concerned not only with providing useful predictions (as traditional ML techniques do), but also with estimating the uncertainty inherent in the problem at hand in the form of probability distributions. Estimating uncertainty helps ensure that predictions are reliable and robust.
Generative modeling with Bean Machine offers many benefits:
- Uncertainty estimation. Predictions are quantified with reliable measures of uncertainty in the form of probability distributions. An analyst can understand not only the system's prediction, but also the relative likelihood of other possibilities.
- Expressivity. It's extremely easy to encode a rich model directly in source code. This allows one to match the structure of the model to the structure of the problem.
- Interpretability. Because the model matches the domain, one can query intermediate variables within the model as conceptually meaningful properties. This can be used to interpret why a particular prediction was made, and can aid the model development process.
Generative Probabilistic Models
A generative probabilistic model consists of random variables and conditional probability distributions (CPDs) that encode knowledge about some domain. For example, consider a simplified model for the spread of infectious diseases, where we wish to express the idea that the average number of new cases on a given day is proportional to the current number of infections, with the proportionality constant being the daily reproduction rate of the disease. In order to express this mathematically, it is common practice to rely on elementary probability distributions (EPDs) with well known statistics, such as the Poisson distribution here:
num_new ~ Poisson(reproduction_rate * num_init)
Let's fix for now the value of
num_init, the initial number of infections, then the above statement gives the CPD of the random variable
num_new, conditioned on the value of its parent random variable
reproduction_rate. Since the parameter of the Poisson distribution is also its mean, this CPD is consistent with the knowledge that we were trying to express.
A well-formed generative model must specify the EPD or CPD of each random variable, and the directed graph induced by all the parent-child relationships among random variables must be acyclic. To complete our model, we must therefore also specify a distribution for the random variable
reproduction_rate. In the case of new diseases, where we don't know anything yet about the actual reproduction rate, this poses a seemingly intractable problem. In the Bayesian approach to this problem, we specify the probability distributions of random variables without parents as our a priori beliefs (i.e., before seeing any data) about them. So, in this example, if infectious disease experts believe that a new disease would have a daily reproduction rate which is strictly positive and could be expected to be drawn from a distribution with a mean of , then we could express this belief with the help of another EPD, the Exponential distribution, as follows:
reproduction_rate ~ Exponential(1 / 0.1)
Given a generative model, the natural next step is to use it to perform inference. Inference is the process of combining a model with data to obtain insights, in the form of a posteriori beliefs over values of interest. Our documentation refers to the data as observations, to the values of interest as queried random variables, and to the insights as posterior distributions.
In our example above, let's say we observe that
num_init = 1087980 and that
num_new = 238154. Now, given this observation, we might want to query the posterior distribution for
reproduction_rate. Mathematically speaking, we seek the following CPD:
One way to understand the semantics of the inference task is to think of a generative probabilistic model as specifying a distribution over possible worlds. A world can be thought of as an assignment of specific admissible values to all random variables in the model. So, for example, some possible worlds in our case are:
reproduction_rate = 0.01, num_new = 9000,
reproduction_rate = 0.1, num_new = 90000, or
reproduction_rate = 0.9, num_new = 800000.
Our generative model specifies a joint probability distribution over each of these worlds, based on the prior distribution we've chosen for
reproduction_rate and the likelihood of
num_new given some
reproduction_rate. Now, the inference task is to restrict attention to only those worlds in which
num_new = 238154. We're interested in learning the resulting posterior distribution over
reproduction_rate assignments within these worlds that are compatible with our observation.
Where Does Bean Machine Fit In?
In the rest of this Overview we'll introduce you to Bean Machine's syntax, and show you how it can be used to learn about problems like this one. Traditionally, lots of painstaking, hand-crafted work has gone into modeling generative scenarios. Bean Machine aims to handle all of the manual work involved in fitting data to your model, leaving you to focus on the exciting part: the problem itself! Keep on reading to find out how.
While we hope that the guides you'll find here are relevant to anyone with an ML background, there are excellent resources available if this is your first exposure to Bayesian analysis! We highly recommend the excellent YouTube series Statistical Rethinking, which walks through Bayesian thinking and probabilistic modeling. For a more hands-on experience, you can check out the free, online tutorial Bayesian Methods for Hackers.
If you have previous knowledge of probabilistic programming, you may want to read below to learn more about specific details of the Bean Machine system. Otherwise, you can go directly to Quick Start for details on how Bean Machine is used!
Bean Machine Advantages
Bean Machine builds on top of PyTorch with a declarative modeling syntax, being therefore simultaneously performant and intuitive for building probabilistic models. It balances automation with flexibility by implementing cutting-edge inference algorithms and allowing the user to select and program custom inferences for different problems and subproblems.
Bean Machine uses a site-based inference engine. "Sites" are random variable families, and Bean Machine uses these families to enable a modular inference engine.
The simplest form of site-based inference is called "single-site" inference. In the single-site paradigm, models are built up from random variables that can be reasoned about individually. Bean Machine can exploit this modularity to update random variables one-at-a-time, reducing unnecessary computation and enabling posterior updates that might not be possible if processing the entire model in one go.
Bean Machine also supports "multi-site" inference. Multi-site inference reasons about multiple families of random variables jointly. This increases complexity during inference, but it allows the engine to exploit inter-site correlations when fitting the posterior distribution.
Altogether, site-based inference is a flexible pattern for trading off complexity and modularity, and enables the advanced techniques outlined below.
Bean Machine provides a declarative syntax with first class support for random variables. This makes it intuitive for users to reason about their models in a program as they would on paper; a model is simply defined by a collection of random variables that interact with each other in a particular way. In Bean Machine, random variables are implemented as decorated Python functions, which provide a unique pointer to that random variable. Using functions makes it simple to determine a random variable's definition, since it is contained in a function that is usually only a few lines long.
This 'localization' where a model is fully defined by a collection of random variables allows Bean Machine to easily track dependencies between them. Armed with the model structure, at inference time, Bean Machine can control which parts of the model get updated and explored. As opposed to trace-based inference where a model execution is stored, operating directly with random variables as first class lazy objects and "worlds" to track their instantiations allows Bean Machine to reason about model specific structure such as control flow. It also allows Bean machine to easily reorder model execution or perhaps update only a subgraph, saving significant amounts of compute in models with many latent variables.
Bean Machine allows the user to design and apply powerful inference methods. Because Bean Machine can propose updates for random variables or groups of random variables individually, the user is free to customize the method which it uses to propose those values. Different inference methods can be supplied for different families of random variables. For example, a particular model can leverage gradient information when proposing values for differentiable random variables, and at the same time might sample from discrete ones with a particle filter. This "compositional inference" pattern enables seamless interoperation among any MCMC-based inference strategies.
At the same time, the user has control over which variables should be updated ("blocked") together versus independently, for instance for random variables that are tightly correlated. This "multi-site inference" may be able to further exploit multiple sites with inference-specific optimizations.
CompositionalInference API balances this flexibility with automation, automatically using proposers based on the support of the random variable, while allowing the user to specify how to build the best compositional MCMC proposer with virtually no additional effort.
Bean Machine supports a variety of classic inference methods such as ancestral sampling and the No-U-Turn sampler (NUTS). However, the framework also leverages single-site understanding of the model in order to provide efficient methods that take advantage of higher-order gradients and model structure.
Bean Machine includes the first implementation of Newtonian Monte Carlo (NMC) in a more general platform. NMC utilizes second-order gradient information to construct a multivariate Gaussian proposer that takes local curvature into account. As such, it can produce sample very efficiently with no warmup period when the posterior is roughly Gaussian. Bean Machine's structural understanding of the model lets us keep computation relatively cheap by only modeling a subset of the space that is relevant to updating a particular random variable.
For certain domains, prepackaged inference methods may not be the best tool for the job. For example, if dealing with a problem specified in spherical coordinates, it may be useful to incorporate a notion of spherical locality into the inference proposal. Or, you may want to incorporate some notion of ordering when dealing with certain discrete random variables. Bean Machine exposes a flexible abstraction called custom proposers for just this problem. Custom proposers let the user design powerful new inference methods from the building blocks of existing ones, while easily plugging into Bean Machine's multi-site paradigm.
Bean Machine Graph Compilation
PyTorch offers strong performance for models comprised of a small number of large tensors. However, many probabilistic models have a rich or sparse structure that is difficult to write in terms of just a handful of large tensor operations. Also, certain inference algorithms such as NUTS have dynamic control flow, which incurs a significant amount of Python and PyTorch overhead during execution.
To address this, we are developing an experimental inference runtime called Bean Machine Graph (BMG) Inference. BMG Inference is a specialized combination of a compiler and a fast, independent C++ runtime that is optimized to run inference even for un-tensorized models. By design, BMG Inference has the same interface as other Bean Machine inference methods, relying on a custom behind-the-scenes compiler to interpret your model and translate it to a faster implementation with no Python dependencies.
BMG Inference routinely achieves 1 to 2 orders-of-magnitude speedup for untensorized models. However, please note that this infrastructure is under development, and the supported feature set may be limited.