Back to basics with David Mackay #3: Gibbs & Slice samplers
Published on May 4, 2019
In this post, I just implement a Gibbs and a slice sampler for a non-totally-trivial distribution. Both of these are vanilla version – no overrelaxation for Gibbs and no elliptical slice samplers, rectangular hyper-boxes etc. I am hoping you never use these IRL. It is a good intro though.
Generating the data
Instead of the usual “let’s draw from a 1-d gaussian” example, let’s make our samplers work a bit. We’re going to generate a mixture of three 2-d gaussians where the dimensions are correlated.
The following class creates such a distribution
Let’s use this to generate a few different distribution to see what they look like.
So, not easy ones; Some islands, some covariance, some corner. Now let’s write our samplers.
Slice sampler
Here’s a 2d slice sampler - we create two auxiliary variables. The code is based off the pseudo-code in Mackay’s book. Note that the proposal boxes are squares and shrunk at the same rate. Not great since we know there is a lot of covariance. In a later post, we might check out some of the more advanced techniques to improve our proposal.
Let’s see how it does for the last example distribution above.
Not bad! Generally the right shape though the lower island seemed to have been sampled less. We also seemed to have sufficient samples from the middle of the distribution but the tail samples seems to be missing.
Gibbs sampler
We need to the conditionals for the Gibbs sampler. Since we are working with multivariate normals, that easy.
The tricky part is picking which gaussian conditional to draw from. Remember that we have 3 multivariate gaussians. So any given point could have come from any of those three. But we know how likely it is that they came from each of those. We can just use these as weights and draw from a multinomial to pick conditional distribution.
The code to get the weights is already included in the GaussianMix class above. Here’s the code for the sampler.
Let’s draw!
We burned the first 5000 samples and plotted the rest. Now we have plenty of samples (too many?) from the tails and the shape of the core is still approximately correct.
Comparison
We can do better than visual inspection though. Let’s just generate a 1000 different distributions and compare the KL divergence across a marginal between the samples from GaussianMix and samples from the two samplers.
I’ve trimmed the values greater than 1.0 for readability.
So Gibbs does better than slice with lower KL divergence scores. But not always.
Here are some where Gibbs does better than slice:
And some where slice does better than Gibbs:
For multimodal distributions, slice does better than Gibbs. There may be a better Gibbs sampler that overcomes this but this makes sense for our implementation. The multinomial draw is very unlikely to shift you to the other far away gaussians if you stuck in one.
So if you have little covariance and the distribution is unimodal (and you have the conditionals!) the Gibbs rocks it. The covariance thing is not a deal breaker though (see overrelaxation) but not sure how I’d get over the multimodal problem in my example.