This week, we discussed dynamical models for MCMC. We began with a toy example using Gibbs / Metropolis sampling to motivate the dynamical methods. Then we discussed two related dynamical models; Stochastic dynamics and Hybrid Monte-Carlo. Sanmi Led the discussion.

The toy model is as follows. The goal is to estimate the likelihood of a bivariate Gaussian. In other words, P(X) = Gaussian(m,S). We assume without loss of generality that m = 0. We let the variance of x_1 = variance of x_2 = 1, with a correlation of 0.99. For Gibbs sampling, it is clear that the marginal distribution P(x_1 | x_2) = Gaussian and can easily be sampled. Similarly, we can use a Gaussian distribution centered around the current state as the proposal distribution for Metropolis sampling. The variance of this Gaussian will have to be kept small so the acceptance probability is reasonable. In both cases, we showed that we explore the more constrained direction well, but sampling in the less constrained direction has almost random walk behavior. We also showed that this can be avoided for nearly independent variables. This observation illustrates why one might prefer the variable groups used for Gibbs / Metropolis sampling to be approximately independent.

It is clear that we need a method that allows large jumps in state so we can explore the high probability regions quickly (faster than with a random walk). Dynamical systems can help to overcome this random walk behavior. We began this section with a review of dynamical systems. The canonical (Gibbs) distribution can be defined as P(s) = .5*exp(-E(s)/T) where s is the micro-state, E(s) is the energy of the state, and T is the temperature. Any parametric distribution over variables that is nowhere zero can also be written in this form. This is the form we used for Gibbs/Metropolis sampling. The idea behind dynamical systems is to add dynamics that allow large changes in state without changing the energy of the system (equivalently, without changing the distribution). We studied stochastic dynamics and hybrid Monte-Carlo sampling methods that achieve this.

We defined a phase space, made up of position variables and momentum variables. The position variables are analogous to the parameters of a model. The momentum variables have no clear interpretation, but they will allow the large sampling jumps we desire. We use a very simple I.I.D. Gaussian distribution for the momentum variables, and the distribution of the position (parameter) variables remains the same. In this simple model, the position and momentum variables are independent. The combined energy is called the “Hamiltonian”. We defined Hamiltonian dynamics; that govern how the states change. The Hamiltonian we chose had special properties. It does not change the energy (distribution) of the system, and the dynamics can be reversed in time.

In order to actually implement this system, we need to discretize the dynamics. We showed how the “leapfrog” discretization with a step-size of eps can be used. We showed that this model introduced a systematic error of order(eps^2). This means that the energy of the system no longer remains constant; though the change is relatively small. To create samples, we used stochastic dynamics:

- Stochastic step: Sample the momentum variables. This can be done using Gibbs sampling as the momentum variables are I.I.D. Gaussians.
- Dynamics step: Evolve the system dynamics using leapfrog iterations.

For sampling the model parameters, one simply ignores the momentum variables since the positions and momentums are independent. In other words, the dynamics of the position and momentum variables are coupled, but their probabilities are independent.

How do we deal with the systematic error in the system energy (distribution)? Hybrid Monte-Carlo modifies the dynamics step to remove this bias. The Stochastic step is identical, but the dynamics step is modified as follows:

- Randomly decide whether to run the dynamics forwards or backwards in time.
- After the leapfrog iterations, decide whether to accept or reject the new state with some acceptance probability.

The acceptance probability is very similar to the Metropolis acceptance probability. It also measures how much the distribution changes; to discourage large changes. We showed that this algorithm satisfied detailed balance, and so the Markov chain generated is ergodic.

Next week, we will discuss simulated annealing and methods for estimating the free energy.

Tags: MCMC

December 7, 2009 at 1:57 pm |

[…] Unfortunately, the explosive decompression of a pen destroyed my notes for part 4, but as always, Sanmi has a nice writeup […]