Sampling

COMP4670/8600 - Statistical Machine Learning - Tutorial

Textbook Questions

These questions are hand picked to both be of reasonable difficulty and demonstrate what you are expected to be able to solve. The questions are labelled in Bishop as either $\star$, $\star\star$, or $\star\star\star$ to rate its difficulty.

  • 8.1
  • 8.2
  • 8.10 Hint: For the first part try marginalise the joint distribution to work out p(a,b). For the second part you may want to use Bayes' rule.
  • 8.13
  • 8.14 Hint: How does energy relate to the distribution of x and y?
  • 8.15 Hint: You can follow the same steps used in the single variable example at the start of 8.4.1
  • 8.18 (Challenge) Hint: To convert to an undirected graph look at the start of 8.3.4. To convert to a directed graph simply pick a node to be your root node.

Assumed knowledge

  • Sampling (lectures)

After this lab, you should be comfortable with:

  • The concept of Monte Carlo methods in general
  • Importance sampling in particular

Setting up the environment $\newcommand{\Ex}{\mathbb{E}}$ $\newcommand{\dd}{\mathrm{d}}$ $\newcommand{\DUniform}[3]{\mathscr{U}\left(#1 ~\middle|~ #2, #3\right)}$ $\newcommand{\DNorm}[3]{\mathscr{N}\left(#1 ~\middle|~ #2, #3\right)}$

In [ ]:
import math
import numpy as np
from scipy.stats import uniform, multivariate_normal
import matplotlib.pyplot as plt

%matplotlib inline

The aim of this tutorial is to investigate the effects of different proposal distributions on importance sampling.

Sampling from Gaussian and Uniform Distributions

Note that help for a certain function can be obtained by using the question mark, for example ?uniform or ?norm.

Repeat the following twice, once each for Gaussian with zero mean and unit variance and Uniform on the unit square.

  1. Sample 1000 data points from a two dimensional distribution.
  2. Compute the two dimensional histogram, with 10 bins in each dimension.
  3. Visualise the histogram as a heatmap. There are various ways of doing this as a two dimensional image. Use an appropriate colormap.
  4. Visualise the difference between the theoretical and empirical values of the density.

The aim of this exercise is to observe the challenges of sampling in more than 1 dimension.

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

A function for performing estimation

Given are the following function $f(x, y)$

\begin{equation*} f(x, y) = x \, y \cos(x) \, \cos(y), \end{equation*}

and the (unnormalised!) distribution $ \widetilde{p}(x, y) $ \begin{equation*} \widetilde{p}(x, y) = \exp \left{- \frac{1}{4}((x - 2)^2 + (y - 3)^2) \right}

- \frac{3}{4} \exp \left\{- \frac{1}{2} ((x - 2)^2 + (y - 3)^2) \right\}

\end{equation*}

The goal is to numerically estimate $ \Ex_{p(x, y)}[f(x, y)] $ defined as \begin{equation*} \Ex_{p(x, y)}[f(x, y)] = \int_{-\infty}^{\infty} f(x, y) \, p(x, y) \dd x \dd y \end{equation*}

where $ p(x, y) $ is the normalised probability distribution proportional to $\widetilde{p}(x, y) $, i.e. \begin{equation*} p(x, y) = \frac{ \widetilde{p}(x, y) } { \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \widetilde{p}(x, y) \dd x \dd y } \end{equation*}

Note, that with importance sampling, this can be achieved without calculating the normalisation. All we need to do, is to use sample points from an appropriate distribution $ \widetilde{q}(x, y) $ which has most of its probability mass in regions where $ \widetilde{p}(x, y) $ is also nonzero.

In the following, we will implement importance sampling to estimate $ \Ex_{p(x, y)}[f(x, y)] $ for two choices of $ \widetilde{q}(x, y) $, a Uniform Distribution and a Gaussian Distribution.

(optional) Plot the functions $f(x,y)$ and $\widetilde{p}(x, y)$. Look at the 3D plotting section in this tutorial).

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Importance Sampling using the Uniform Distribution

Draw random samples $ x_n $ and $ y_n $, $ n = 1, \dots, N $, each i.i.d. from the uniform distribution on the interval $[-10, 20] $.

Now use the samples from the distribution $ \widetilde{q}(x, y) $ to estimate $ \Ex_{p(x, y)}[f(x, y)] $ via importance sampling.

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Importance Sampling using the Gaussian Distribution

create random vectors from the following Gaussian Distribution $ \widetilde{q}(x, y) = \DNorm{(x, y)^T}{\mathbf{\mu} = (2, 3)^T}{\mathbf{\Sigma} = \begin{bmatrix} 2 & 0 \\ 0 & 3 \\ \end{bmatrix}} $.

Now use the samples from the distribution $ \widetilde{q}(x, y) $ to estimate $ \Ex[p(x, y)]{f(x, y)} $ via importance sampling.

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Comparing the empirical and analytic results

For the function $f(x, y)$ and the normalised distribution $p(x, y)$, the correct result can be computed analytically \begin{align*} \Ex{p(x, y)}[f(x, y)] & = \int{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x, y) \, p(x, y) \dd x \dd y \ & = \frac{16 (5 \cos(1) + \cos(5) + \sin(1) - 5 \sin(5))

  - 3 e (7 \cos(1) + 5 \cos(5) + \sin(1) - 5 \sin(5)) }
           {10 \, e^2} \\                      
& \approx 0.670859

\end{align*}

Compare the convergence rate of the two approaches given above. Plot on the same plot the two curves showing the empirical expectation as a function of the number of samples as well as the analytical value.

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Discuss how one could choose an appropriate proposal distribution for a particular function $f(x,y)$ and distribution $p(x,y)$.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Interpretation

  1. How this setup can be used to estimate a posterior predictive mean (very easy), and a posterior predictive variance (slightly trickier).
  • An example case in Bayesian machine learning which would would require sampling; i.e. be analytically intractable.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

In [ ]: