These questions are hand picked to both be of reasonable difficulty and demonstrate what you are expected to be able to solve. The questions are labelled in Bishop as either $\star$, $\star\star$, or $\star\star\star$ to rate its difficulty.

**8.1****8.2****8.10**Hint: For the first part try marginalise the joint distribution to work out p(a,b). For the second part you may want to use Bayes' rule.**8.13****8.14**Hint: How does energy relate to the distribution of x and y?**8.15**Hint: You can follow the same steps used in the single variable example at the start of 8.4.1**8.18**(Challenge) Hint: To convert to an undirected graph look at the start of 8.3.4. To convert to a directed graph simply pick a node to be your root node.

- Sampling (lectures)

- The concept of Monte Carlo methods in general
- Importance sampling in particular

Setting up the environment $\newcommand{\Ex}{\mathbb{E}}$ $\newcommand{\dd}{\mathrm{d}}$ $\newcommand{\DUniform}[3]{\mathscr{U}\left(#1 ~\middle|~ #2, #3\right)}$ $\newcommand{\DNorm}[3]{\mathscr{N}\left(#1 ~\middle|~ #2, #3\right)}$

In [ ]:

```
import math
import numpy as np
from scipy.stats import uniform, multivariate_normal
import matplotlib.pyplot as plt
%matplotlib inline
```

The aim of this tutorial is to investigate the effects of different proposal distributions on importance sampling.

*Note that help for a certain function can be obtained by using the question mark, for example * `?uniform`

or `?norm`

.

Repeat the following twice, once each for Gaussian with zero mean and unit variance and Uniform on the unit square.

- Sample 1000 data points from a two dimensional distribution.
- Compute the two dimensional histogram, with 10 bins in each dimension.
- Visualise the histogram as a heatmap. There are various ways of doing this as a two dimensional image. Use an appropriate colormap.
- Visualise the difference between the theoretical and empirical values of the density.

The aim of this exercise is to observe the challenges of sampling in more than 1 dimension.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Given are the following function $f(x, y)$

\begin{equation*} f(x, y) = x \, y \cos(x) \, \cos(y), \end{equation*}and the (unnormalised!) distribution $ \widetilde{p}(x, y) $ \begin{equation*} \widetilde{p}(x, y) = \exp \left{- \frac{1}{4}((x - 2)^2 + (y - 3)^2) \right}

```
- \frac{3}{4} \exp \left\{- \frac{1}{2} ((x - 2)^2 + (y - 3)^2) \right\}
```

\end{equation*}

The goal is to numerically estimate $ \Ex_{p(x, y)}[f(x, y)] $ defined as \begin{equation*} \Ex_{p(x, y)}[f(x, y)] = \int_{-\infty}^{\infty} f(x, y) \, p(x, y) \dd x \dd y \end{equation*}

where $ p(x, y) $ is the normalised probability distribution proportional to $\widetilde{p}(x, y) $, i.e. \begin{equation*} p(x, y) = \frac{ \widetilde{p}(x, y) } { \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \widetilde{p}(x, y) \dd x \dd y } \end{equation*}

Note, that with importance sampling, this can be achieved without calculating the normalisation. All we need to do, is to use sample points from an appropriate distribution $ \widetilde{q}(x, y) $ which has most of its probability mass in regions where $ \widetilde{p}(x, y) $ is also nonzero.

In the following, we will implement importance sampling to estimate $ \Ex_{p(x, y)}[f(x, y)] $ for two choices of $ \widetilde{q}(x, y) $, a Uniform Distribution and a Gaussian Distribution.

**(optional) Plot the functions $f(x,y)$ and $\widetilde{p}(x, y)$. Look at the 3D plotting section in this tutorial).**

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Draw random samples $ x_n $ and $ y_n $, $ n = 1, \dots, N $, each i.i.d. from the uniform distribution on the interval $[-10, 20] $.

Now use the samples from the distribution $ \widetilde{q}(x, y) $ to estimate $ \Ex_{p(x, y)}[f(x, y)] $ via importance sampling.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

create random vectors from the following Gaussian Distribution $ \widetilde{q}(x, y) = \DNorm{(x, y)^T}{\mathbf{\mu} = (2, 3)^T}{\mathbf{\Sigma} = \begin{bmatrix} 2 & 0 \\ 0 & 3 \\ \end{bmatrix}} $.

Now use the samples from the distribution $ \widetilde{q}(x, y) $ to estimate $ \Ex[p(x, y)]{f(x, y)} $ via importance sampling.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

For the function $f(x, y)$ and the normalised distribution $p(x, y)$,
the correct result can be computed analytically
\begin{align*}
\Ex*{p(x, y)}[f(x, y)]
& = \int*{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x, y) \, p(x, y)
\dd x \dd y \
& = \frac{16 (5 \cos(1) + \cos(5) + \sin(1) - 5 \sin(5))

```
- 3 e (7 \cos(1) + 5 \cos(5) + \sin(1) - 5 \sin(5)) }
{10 \, e^2} \\
& \approx 0.670859
```

\end{align*}

Compare the convergence rate of the two approaches given above. Plot on the same plot the two curves showing the empirical expectation as a function of the number of samples as well as the analytical value.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Discuss how one could choose an appropriate proposal distribution for a particular function $f(x,y)$ and distribution $p(x,y)$.

*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*

- How this setup can be used to estimate a posterior predictive mean (very easy), and a posterior predictive variance (slightly trickier).

- An example case in Bayesian machine learning which would would require sampling; i.e. be analytically intractable.

*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*

In [ ]:

```
```