Posts Tagged ‘Gaussian Process’

Wednesday 04/07/2010

April 28, 2010

Jeremy Stober put together a great set of notes on Gaussian processes for reinforcement learning.  The notes are very clear and easy to follow. You can find them here.

Wednesday 03/31/2010

April 7, 2010

Meghana completed the discussion of  Gaussian processes for classification  and Sanmi discussed Expectation Propagation. You can find the notes here and here.

Wednesday 03/24/2010

March 24, 2010

Meghana began covering Gaussian processes for classification.  You can find the notes here.

Stochastic Process notes

March 23, 2010

Priyank is still working the latex notes. In the meantime, here are the scanned notes from those meetings.

Wednesday 03/03/2010, 03/10/2010

March 8, 2010

We began discussion of Gaussian Processes for regression.

I am going to try uploading the presenter’s notes directly. You can find the notes for this week here and here.

Next week (after spring break), we will move on to Gaussian Processes for classification. We will cover the end of chapter 2 in some detail in a couple of weeks.

We need a volunteer to cover MCMC for GP’s. This paper on slice sampling was recommended.

Wednesday 02/10/2010, 02/17/2010, 02/24/2010

February 16, 2010

Priyank discussed some basic measure theory in preparation for our discussion on Gaussian processes. Next week, we will discuss how stochastic processes such as the Gaussian process arise from basic concepts.

There will be no weekly scribe notes during the series on basic theory. Instead, I will post a comprehensive set of scribe notes (with references) after we are done.

EDIT: We spent three meetings discussing general stochastic processes. I will post notes once they are avaliable.

Wednesday 02/03/2010

February 9, 2010
Welcome back. This semester, the meetings will be held on Wednesday mornings at 11am in ACES 3.116. We will be discussing Gaussian processes, using the book “Gaussian Processes for Machine Learning” by Carl Edward Rasmussen and Chris Williams.
Goo led the first meeting. We covered a basic overview of Gaussian processes for regression. For details, see the NIPS 2006 tutorial “Advances in Gaussian processes” by Rasmussen.
The tutorial begins by discussing the prediction problem. Suppose one has some historical data; such as the carbon dioxide concentration in the air over time. We would like to predict the concentration at some future time. One simple solution is to find a linear fit. However, the yearly cycles in the CO2 concentration might suggest using a sinusoidal term. There might also be other types of correlations observed in the data that we would like to model. How does one select the best model to use? How does one select the best parameters? Rasmussen states “Gaussian processes solve some of the above, and provide a practical framework to address the remaining issues”.
Definition (Rasmussen): A Gaussian process is a collection of random variables, any finite number of which have (consistent) Gaussian distributions.
Gaussian processes provide a principled framework for generalizing the multivariate Gaussian distribution to an infinite number of variables. Because of this property, a Gaussian process model is sometimes described as a method for defining a prior over functions. Given an index (or input) x, the Gaussian process is completely defined by some mean function \mu(x) and a covariance function k(\cdot, \cdot).  In most applications, we will assume \mu(x) = 0 and concentrate on the effect of the covariance. k(\cdot, \cdot) is also known as the kernel function. We will discuss the properties of this function in more detail over the coming weeks.
Gaussian process regression exploits the marginalization properties of the multivariate Gaussian distribution i.e. conditionals and marginals of a joint Gaussian are also Gaussian. Suppose the learner is given data \{ x_i, y_i \}_{i = 1}^N and would like to learn a regression function modeled as y_i = f(x_i) + \epsilon, where \epsilon is the noise term. One good model for this function is a Gaussian process with an appropriate kernel function. The means and variances of the prediction are easy to write out (though it is a good exercise to work out the equations yourself). The predictive distribution for a new data point y^* is given as:
p ( y^* | x^*, \mathbf{x}, \mathbf{y}) \sim \mathcal{N} ( \mathbf{k}_n^T \mathbf{K}^{-1} \mathbf{y}, k_{n+1} - \mathbf{k}_n^T \mathbf{K}^{-1} \mathbf{k}_n ) ,
where \mathbf{k}_n is the N length kernel covariance between the training data and new point x^*, k_{n+1} is the variance of x*, and mathbf{K} is the N \times N kernel matrix capturing the covariance between the training data points.
Next week, Priyank will discuss the idea of a stochastic process in more detail, and complete the discussion on Gaussian process regression.