{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mixture Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### COMP4670/8600 - Introduction to Statistical Machine Learning - Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\newcommand{\\trace}[1]{\\operatorname{tr}\\left\\{#1\\right\\}}$\n", "$\\newcommand{\\Norm}[1]{\\lVert#1\\rVert}$\n", "$\\newcommand{\\RR}{\\mathbb{R}}$\n", "$\\newcommand{\\inner}[2]{\\langle #1, #2 \\rangle}$\n", "$\\newcommand{\\DD}{\\mathscr{D}}$\n", "$\\newcommand{\\grad}[1]{\\operatorname{grad}#1}$\n", "$\\DeclareMathOperator*{\\argmin}{arg\\,min}$\n", "Run this cell to set up $\\LaTeX$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bishop Exercise 9.7\n", "\n", "Verify that maximization of the complete-data log likelihood,\n", "\\begin{align}\n", "\\ln p(\\mathbf{X},\\mathbf{Z}|\\mathbf{\\mu},\\mathbf{\\Sigma},\\mathbf{\\pi}) \n", "& = \n", "\\sum_{n=1}^N \\sum_{k=1}^K z_{nk} \\{ \\ln \\pi_k + \\ln \\mathcal{N}(\\mathbf{x}_n | \\mathbf{\\mu}_k, \\mathbf{\\Sigma}_k \\},\n", "\\end{align}\n", "for a Gaussian mixture model leads to the result that the means and covariances of each component are fitted independently to the corresponding group of data points, and the mixing coefficients are given by the fractions of points in each group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution description" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bishop Exercise 9.8\n", "\n", "Show that if we maximize\n", "\\begin{align}\n", "\\mathbb{E}_\\mathbf{Z} \\left[ \\ln p(\\mathbf{X},\\mathbf{Z}|\\mathbf{\\mu},\\mathbf{\\Sigma},\\mathbf{\\pi}) \\right]\n", "& = \n", "\\sum_{n=1}^N \\sum_{k=1}^K \\gamma(z_{nk}) \\{ \\ln \\pi_k + \\ln \\mathcal{N}(\\mathbf{x}_n | \\mathbf{\\mu}_k \\mathbf{\\Sigma}_k \\},\n", "\\end{align}\n", "with respect to $\\mathbf{\\mu}_k$ while keeping the responsibilities $\\gamma(z_{nk})$ fixed, we obtain the closed form solution\n", "\\begin{align}\n", "\\mathbf{\\mu}_k & = \\frac{1}{N_k} \\sum_{n=1}^N \\gamma(z_{nk}).\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution description" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bishop Exercise 9.9\n", "\n", "Show that if we instead maximize with respect to $\\mathbf{\\Sigma}_k$ and $\\mathbf{\\pi}$ while keeping the responsibilities $\\gamma(z_{nk})$ fixed, we obtain the closed form solution\n", "\\begin{align}\n", "\\mathbf{\\Sigma}_k \n", "& = \n", "\\frac{1}{N_k} \\sum_{n=1}^N \\gamma(z_{nk}) (\\mathbf{x}_n-\\mathbf{\\mu}_k)(\\mathbf{x}_n-\\mathbf{\\mu}_k)^\\top, \\\\\n", "\\pi_k &= \\frac{N_k}{N}.\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution description" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mixture Model Implementation\n", "\n", "### Get the data\n", "\n", "Download the famous old faithful dataset from here:\n", "\n", "https://machlearn.gitlab.io/isml2017/tutorial/faithful.csv\n", "\n", "The data is two dimensional, denoting the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# load the old faithful dataset and standardise\n", "\n", "data = np.loadtxt('faithful.csv', delimiter=',')\n", "data_labels = ('Eruption length','Eruption wait')\n", "\n", "data -= np.mean(data, axis=0)\n", "data /= np.std(data, axis=0)\n", "data = data.T\n", "\n", "assert data.shape == (2, 272)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting helper functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# plot_cov_ellipse was taken from here:\n", "# http://www.nhsilbert.net/source/2014/06/bivariate-normal-ellipse-plotting-in-python/\n", "\n", "def plot_cov_ellipse(cov, pos, volume=.5, ax=None, fc='none', ec=[0,0,0], a=1, lw=2):\n", " \"\"\"\n", " Plots an ellipse enclosing *volume* based on the specified covariance\n", " matrix (*cov*) and location (*pos*). Additional keyword arguments are passed on to the \n", " ellipse patch artist.\n", "\n", " Parameters\n", " ----------\n", " cov : The 2x2 covariance matrix to base the ellipse on\n", " pos : The location of the center of the ellipse. Expects a 2-element\n", " sequence of [x0, y0].\n", " volume : The volume inside the ellipse; defaults to 0.5\n", " ax : The axis that the ellipse will be plotted on. Defaults to the \n", " current axis.\n", " \"\"\"\n", "\n", " import numpy as np\n", " from scipy.stats import chi2\n", " import matplotlib.pyplot as plt\n", " from matplotlib.patches import Ellipse\n", "\n", " def eigsorted(cov):\n", " vals, vecs = np.linalg.eigh(cov)\n", " order = vals.argsort()[::-1]\n", " return vals[order], vecs[:,order]\n", "\n", " if ax is None:\n", " ax = plt.gca()\n", "\n", " vals, vecs = eigsorted(cov)\n", " theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))\n", "\n", " kwrg = {'facecolor':fc, 'edgecolor':ec, 'alpha':a, 'linewidth':lw}\n", "\n", " # Width and height are \"full\" widths, not radius\n", " width, height = 2 * np.sqrt(chi2.ppf(volume,2)) * np.sqrt(vals)\n", " ellip = Ellipse(xy=pos, width=width, height=height, angle=theta, **kwrg)\n", "\n", " ax.add_artist(ellip)\n", " \n", "\n", "def plot_components(mu, Sigma, colours, *args, **kwargs):\n", " '''\n", " Plot ellipses for the bivariate normals with mean mu[:,i] and covariance Sigma[:,:,i]\n", " '''\n", " assert mu.shape[1] == Sigma.shape[2]\n", " assert mu.shape[0] == 2\n", " assert Sigma.shape[0] == 2\n", " assert Sigma.shape[1] == 2\n", " for i in range(mu.shape[1]):\n", " kwargs['ec'] = colours[i]\n", " plot_cov_ellipse(Sigma[:,:,i], mu[:,i], *args, **kwargs)\n", "\n", "import matplotlib.colors as mcol\n", "\n", "br_cmap = mcol.LinearSegmentedColormap.from_list(\"MyCmapName\",[\"b\",\"r\"])\n", "\n", "def plot_data(redness=None):\n", " if redness is not None:\n", " assert len(redness) == data.shape[1]\n", " assert all(_ >= 0 and _ <= 1 for _ in redness)\n", " c = redness\n", " else:\n", " c = 'g'\n", " plt.figure(figsize=(8,8))\n", " plt.scatter(data[0,:],data[1,:], marker='.', s=8, linewidths=2, c=c, cmap=br_cmap)\n", " plt.xlabel(data_labels[0])\n", " plt.ylabel(data_labels[1])\n", " plt.axis([-2,2,-2,2], 'equal')\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# initialise the mixture components and plot\n", "\n", "pi_0 = np.array([0.5,0.5])\n", "mu_0 = np.array([[-1,1],[1,-1]], dtype=np.float)\n", "Sigma_0 = np.concatenate(\n", " [np.eye(2).reshape(2,2,1) for _ in range(mu_0.shape[1])],\n", " axis=2,\n", ")\n", "\n", "plot_data()\n", "plot_components(mu_0, Sigma_0, ['b','r'], 0.2)\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimise the Gaussian Mixture with EM\n", "\n", "Using the equations you have derived, along with the relevant related material from the Bishop textbook, perform maximum likelihood estimation for $\\mathbf{\\mu}$, $\\mathbf{\\Sigma}$ and $\\mathbf{\\pi}$.\n", "\n", "Make some nice plots. Note that the plot_data function takes an argument to control the color of the scatter plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpret the Previous Algorithm in terms of General EM\n", "\n", "Refer to Bishop and/or the lecture slides, and explain with equations:\n", "1. the assumed generative data generating process for the mixture of Gaussians,\n", "- what the parameters of the model are,\n", "- what the latent (unobserved) variables are and why they make maximum likelihood parameter estimation non-trivial,\n", "- how EM provides a solution to the maximum likelihood estimation problem.\n", "\n", "Make yourself comfortable with EM to the point that you could readily apply it to another model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution description" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }