Graphical Models

Textbook Question

  • Q8.20: Induction on graph structure (recall from MATH1005/6005) (Difficulty $\star$)
  • Q8.21: Note typo: it should be $f_s(x_s)$ (Difficulty $\star\star$)
  • Q8.27: Construct example showing greedy method not working (Difficulty $\star\star$)
  • Q8.29: Induction on tree structure (recall from MATH1005/6005) (Difficulty $\star\star$)
  • Extra: Derive eq 8.74 to 8.85 w.r.t Fig 8.51

  • Q10.2: Solving simulataneous equations (Difficulty $\star$)

  • Q10.3: Use lagrangian to enforce normalisation of q (Difficulty $\star\star$)
  • Q10.6: Hint, how to introduce log term for both p and q? (Difficulty $\star\star$)

  • Q2.44: Manipulation with a more complex conjugate to derive the posterior (Difficulty $\star\star$)

  • Q10.7: Rewritting to the form of the respective distributions. Mostly algebra. (Difficulty $\star\star$)

  • Q10.8: What will $b_n$ be approximated as? (Difficulty $\star$)
  • Q10.9: Essentially, deriving 10.31 and 10.32 (Difficulty $\star\star$)
COMP4670/8600 - Statistical Machine Learning - Tutorial

Assumed knowledge

  • Directed and undirected graphical models (lectures)

After this lab, you should be comfortable with:

  • Basic operations and definitions of graphical models

Setting up the environment

In [ ]:
import numpy as np
from IPython.display import display, Image

This tutorial mainly includes three parts: Bayesian Networks (BN), Markov Random Field (MRF) and Sum Product Algorithm (Factor Graph). Before diving into the graphical models, we will first review some basic probability concepts.

Reviewing Probability

Discrete Probability

Recall the meaning of the following terms:

  • Joint probability distribution
  • Marginal distribution
  • Conditional distribution

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Consider the following table defining the joint probability distribution of two variables $A$ and $B$.

A=$\square$ A=$\bigcirc$ A = $\clubsuit$ A = $\heartsuit$ A = $\triangle$
B=$p$ 0.01 0.01 0.12 0.01 0.14
B=$q$ 0.03 0.15 0.01 0.01 0.01
B=$r$ 0.13 0.11 0.07 0.18 0.01

Compute the following distributions:

  • $p(B=p | A = \bigcirc)$
  • $p(B | A = \bigcirc)$
  • $p(B)$

You may do this calculation by hand or using python.

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Bayes Rule

Recall that there are only two rules of probability, the sum rule and the product rule. Using these two rules, prove Bayes rule.

$$p(Y|X) = \frac{p(X|Y)p(Y)}{\sum_Y p(X,Y)}$$

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Empirical verification of Bayes rule

Using the distribution $p(A,B)$ above, verify the Bayes rule i.e.

$$p(A|B) = \frac{p(B|A)p(A)}{\sum_A p(A,B)}$$
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Dependent random variables

Consider the following problem with 5 random variables.

  • Aches with states (False, True)
  • Bronchitis with states (none, mild, severe)
  • Cough with states (False, True)
  • Disease with states (healthy, carrier, sick, recovering)
  • Emergency with states (False, True)

How much memory is needed to store the joint probability distribution if:

  • All variables are dependent?
  • All variables are independent?

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Bayesian Network

Bayesian Network is directed graphical model expressing causal relationship between variables.

Consider the following graphical model. Anwser the following questions:

In [ ]:
Image(url="https://machlearn.gitlab.io/sml2020/tutorials/graphical_model.png")

No1. Write down the joint factorisation for the above graph.

No2. How much memory is needed to store the joint probability distribution?

Conditional Independence (D-seperation)

If all paths are blocked between nodes $X$, $Y$ when a set of nodes $Z$ is observed, then $X$ is $d$-separated from $Y$ by $Z$ and $X \perp Y | Z$. A path is blocked if it includes a node such that:

  • the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in $Z$
  • the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, is in $Z$.

No3. Identify and prove whether the conditional independences holds for the following cases:

  • A and D, when B is observed.
  • B and C, when none of the variables are observed.
  • B and C, when E is observed.
  • A and C, when none of the variables are observed.
  • A and C, when B is observed.
  • A and E, when D is observed.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Calculate distributions for BN

Consider the following tables.

$p(B)$ B=n B=m B=s
marginal 0.97 0.01 0.02
$p(C)$ C=False C=True
marginal 0.7 0.3
$p(A B)$ B=n B=m B=s
A=False 0.9 0.8 0.3
A=True 0.1 0.2 0.7
$p(D B,C)$ B=n, C=F B=m, C=F B=s, C=F B=n, C=T B=m, C=T B=s, C=T
D=healthy 0.9 0.8 0.1 0.3 0.4 0.01
D=carrier 0.08 0.17 0.01 0.05 0.05 0.01
D=sick 0.01 0.01 0.87 0.05 0.15 0.97
D=recovering 0.01 0.02 0.02 0.6 0.4 0.01
$p(E D)$ D=h D=c D=s D=r
E=False 0.99 0.99 0.4 0.9
E=True 0.01 0.01 0.6 0.1

Compute the following:

  • p(A,B,C,D,E)
  • p(E)
  • p(E|B=s)
  • p(E|B=s, C=T)

Note that there are two ways of arriving at the distributions:

  1. By computing p(A,B,C,D,E) and marginalising and conditioning appropriately
  2. By only computing the required distributions directly using the graphical model structure.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Sampling a Discrete Probability Mass Function (PMF)

  1. Code a class which represents a PMF, and has a sample method which draws a sample from the PMF.
  2. Use it to draw samples from $p(C)$ and empirically compute and print the probability that your sampler returns the value False.
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Markov Random Field

For Bayesian Networks, create the cliques of a MRF by adding undirected edges bewteen all pairs of parents for each node in the graph. This process of 'marrying the parents' is called moralisation. But note that the resulting MRF may represent different conditional independence statements than the original BN.

Convert the Bayesian Network above to a Markov Random Field. Draw the resulting network.

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Identify the maximal cliques in the graph and write down the corresponding clique potentials. Compare the conditional independence statements of the MRF with the BN.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Sum Product Algorithm

The aim of this exercise is to implement the sum product algorithm on a chain graph.

Distributive law

The distributive property can be used to save computation, and is the basis of message passing and dynamic programming. See an anecdote about camels.

Consider the following equation: $$ 2*3 + 2*5 = 2 * (3+5) $$

  • How many mathematical operations (multiplications and additions) are on the left hand side?
  • How many mathematical operations are on the right hand side?

Construct a larger example where there is even more computational savings.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Message passing

Consider the following factor graph.

In [ ]:
Image(url="https://machlearn.gitlab.io/sml2020/tutorials/message_passing.png")

The factors are given by the following tables:

f(A,B) A=$\square$ A=$\bigcirc$ A = $\clubsuit$ A = $\heartsuit$ A = $\triangle$
B=$p$ 0.01 0.01 0.12 0.01 0.14
B=$q$ 0.03 0.15 0.01 0.01 0.01
B=$r$ 0.13 0.11 0.07 0.18 0.01
g(B,C) B=$p$ B=$q$ B=$r$
C=$w$ 0.05 0.06 0.07
C=$x$ 0.1 0.3 0.2
C=$y$ 0.03 0.02 0.1
C=$z$ 0.11 0.15 0.08
h(C)
C=$w$ 1.2
C=$x$ 3.2
C=$y$ 1.8
C=$z$ 2.3

Using the sum product algorithm, compute the marginal distribution of the random variable $B$.

Hint: Note that the factors are not normalised.

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate
In [ ]: