Bayesian regression using MCMC
Bayesian Regression has traditionally been very difficult to work with since analytical solutions are only possible for simple problems. Hence, the frequentist method called “least-squares regression” has been dominant for a long time. Bayesian approaches, however, has the advantage of working with probability distributions such that the estimated parameters of a model are represented by a probability distribution (formally we talk about “likelihood” for parameters and “probability” for random variables).
For more complex models it is possible to solve the Bayesian problem numerically using, for example, MCMC (Markov-Chain Monte-Carlo). It is a computationally expensive method which gives the solution as a set of points in the parameter space which are distributed according to the likelihood of the parameters given the data at hand.
For this example, we are working with a linear model of the form \(f(x)=a+bx + \epsilon\), where \(\varepsilon \sim N\left(0,\sigma^2\right)\) (normal distributed noise).
First, we need to generate some random data starting choosing 50 samples where \(a=1\), \(b=2\), \(\sigma^2=1\):
One simple way to solve the MCMC-problem is the Metropolis-Hastings method. It is based on evaluating changes in the posterior likelihood function one parameter at a time doing a random walk trying to stay in a region with high probability all the time. If the likelihood is multi-modal, it is, of course, possible to get stuck in one mode.
The resulting estimated likelihood given 100,000 samples for the linear regression is shown below where the red dot represents the highest likelihood, and the blue dot is the real parameters. The contours show a smoothed kernel estimate of the density of the distribution. Note that there is a slight covariance between parameters a and b which means that if you change one of the parameters the other has to change as well.
It turns out that the maximum likelihood of the MCMC-estimate and the Least-Squares method gives the same result, which is expected since maximum likelihood and least-squares are equal in the presence of Gaussian noise.
This example has just been a simple demonstration of how to find a good fit for model parameters given some data measurement. Based on the likelihood plots above we obtain some understanding of the sensitivity of changes in the parameters and if they are likely to be correlated to obtain maximum likelihood.
In the presence of non-gaussian noise and high dimensional complex models, MCMC might be your only way to obtain a solution at all at the cost of long durations of computation.