This week, we began the discussion of MCMC using *Probabilistic Inference Using Markov Chain Monte Carlo Methods* by Neal. Sanmi led the discussion.

We began by motivating probabilistic models of data. We discussed some common types of models including belief networks and latent variable models. We discussed how realistic data is modeled by complicated models. However, these models might be difficult to optimize. For this reason, assumptions such as assuming the dataset is independent and identically distributed (IID) might help reduce the complexity of the optimization. Next, we discussed how latent models can be used to induce correlations in the data model even with IID assumptions by averaging over unknown variables.

We discussed the history of MCMC methods in data analysis, starting with the work of Metropolis (’57) to Alder and Wainwright (’59), and discussed how the MCMC model developed from statistical physics models. Statistical physics often attempts to model the distribution of unknown micro-states; such as position and velocity, by using macroscopic properties; such as temperature and volume. We defined the energy function and showed how it can be used to model simple distributions over the states. If the system is closed, the energy E(s) is constant E_o. The micro-canonical model assumes that the distribution of states is uniform over states with E(s) = E_o and 0 elsewhere. The “canonical” or “Gibbs” model assumes a more flexible system where the state is distributed as an exponential function of the energy. This model is useful for studying large scale system properties such as phase transition e.g from solid to liquid.

We discussed a specific model – the 2-D Ising model for ferromagnetism, and discussed how this model can be used to describe distributions over pixels of images. Statistical physics models are closely related to statistical models in data analysis using a simple identification between the energy function and the probability distribution.

We showed how many parameters of interest; such as conditional distributions or predictive distributions, can be expressed as an expectation of some function with variables drawn from some complicated distribution. This motivated methods for solving this expectation. We began with analytical methods using closed form integration. We also discussed how numerical integration can be used. However, many distributions of interest in high dimensional spaces have only a few peaks and large regions of low probability. We discussed why numerical integration methods might fail in this scenario. We also discussed the second order approximation. In this scenario, one approximates the distribution using a second order Taylor series about some mode. The Taylor expansion is identified with some canonical distribution such as a Gaussian distribution. One can now compute the integration using this substitute. Unfortunately, this method requires the Hessian; which might be difficult to compute in high dimensions, and this model does not handle multi-modal data well. We discussed how this model can be improved to deal with multi-modal data.

Next week, we will discuss other approximation methods such as importance sampling. We will also discuss relevant properties of Markov chains and use these to motivate MCMC methods.

Tags: MCMC

November 15, 2009 at 8:44 pm |

[…] this set of notes, Sanmi’s descriptions are thorough enough that I don’t need to write anything myself, as long as I can get […]