{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Jupyter notebooks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### COMP4670/8600 - Statistical Machine Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial will introduce the *basic elements* for writing Python programs, and using \n",
"[Jupyter notebooks](http://jupyter.org/). \n",
"\n",
"Due to the wide variety of backgrounds that students may have, it is worth recalling some mathematics and statistics that we build upon in this course."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic knowledge\n",
"\n",
"**IMPORTANT: When using mathematical formulas, provide the precise name for each component.**\n",
"\n",
"$\\newcommand{\\RR}{\\mathbb{R}}$\n",
"\n",
"\n",
"### Random variables\n",
"\n",
"Write down the definitions of the following entities, and provide a simple example to illustrate.\n",
"\n",
"1. The expectation of a function $f$ with respect to a\n",
" * continuous random variable $X$\n",
" * discrete random variable $X$\n",
"2. The variance of a random variable $X$.\n",
"3. Independence of two random variables $X$ and $Y$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Answer\n",
"*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Discrete probabilities\n",
"\n",
"For discrete random variables $X$ and $Y$, define the following, and show an example of how it applies to the example below.\n",
"\n",
"| $p(\\mathbf{X},\\mathbf{Y})$ | X=a | X=b | X=c | X=d | X=e |\n",
"|:--------------------------:|:--:|:--:|:--:|:--:|:--:|\n",
"| **Y** = red |0.2 |0.1 |0.1 |0.01|0.04|\n",
"| **Y** = green |0.08|0.07|0.01|0.05|0.05|\n",
"| **Y** = blue |0.01|0.01|0.07|0.05|0.15|\n",
"\n",
"1. The sum rule of probability theory\n",
"2. The product rule of probability theory\n",
"3. Independence of two random variables $X$ and $Y$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Answer\n",
"*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculus\n",
"\n",
"Compute the gradient of the following function $f:\\RR\\to\\RR$\n",
"$$\n",
"f(x) = \\frac{1}{1 + \\exp(x^2)}\n",
"$$\n",
"What would the the gradient if $x$ was two dimensional (that is $f:\\RR^2\\to\\RR$)? Generalise the scalar function above appropriately."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Answer\n",
"*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Python and Programming for Machine Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*If you already know Python and Jupyter notebooks well, please work on Tutorial 1b \"Matrix decomposition\"*\n",
"\n",
"The introduction will focus on the concepts necessary for writing small programs in Python for the purpose of Machine Learning. That means, we expect a user of the code will be a reasonable knowledgeable person. Therefore, we can *skip* most of the code a robust system would have to contain in order to *check* the input types, *verify* the input parameter ranges, and *make sure* that really nothing can go wrong when somebody else is using the code.\n",
"Having said this, you are nevertheless encouraged to include some sanity tests into your code to avoid making simple errors which can cost you a lot of time to find.\n",
"Some of the Python concepts discussed in the tutorial will be\n",
"- Data types (bool, int, float, str, list, tuple, set, dict)\n",
"- Operators\n",
"- Data flow\n",
"- Functions\n",
"- Classes and objects\n",
"- Modules and how to use them\n",
"\n",
"**We will be using [Python3](https://wiki.python.org/moin/Python2orPython3) in this course**.\n",
"\n",
"Some resources:\n",
"- [CodeAcademy](http://www.codecademy.com/en/tracks/python) gives a step by step introduction to python\n",
"- [How to think like a computer scientist](http://interactivepython.org/courselib/static/thinkcspy/index.html) does what it says, using Python\n",
"\n",
"## Installation\n",
"\n",
"The easiest way to get a working Python environment is using one of the following collections:\n",
"- [Enthought canopy](https://store.enthought.com/)\n",
"- [Anaconda](http://continuum.io/downloads)\n",
"\n",
"It is also not too difficult to install python using your favourite package manager and then use [conda](http://conda.pydata.org/docs/) or [pip](http://en.wikipedia.org/wiki/Pip_%28package_manager%29) to manage python packages."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Jupyter Notebooks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**To work on a worksheet or assignment, download the notebook and edit it locally.**\n",
"\n",
"[Jupyter notebooks](http://jupyter.org/) provide a convenient browser based environment for data analysis in a literate programming environment. The descriptive parts of the notebook implements an enhanced version of [markdown](http://daringfireball.net/projects/markdown/syntax), which allows the use of [LaTeX](http://www.latex-project.org/) for rendering equations.\n",
"1. Descriptive notes\n",
" - Markdown\n",
" - LaTeX\n",
"2. Computational code\n",
" - numerical python\n",
" * numpy\n",
" * scipy\n",
" - matplotlib\n",
" \n",
"To use a notebook locally:\n",
"```bash\n",
"jupyter notebook name_of_file.ipynb\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Markdown and LaTeX\n",
"\n",
"In addition to lists and links which are already shown above, \n",
"tables are also nice and easy\n",
"\n",
"|Title | Middle| Left aligned | Right aligned |\n",
"|------|:-----:|:-----|--:|\n",
"|Monday|10:00|Sunny|*30*|\n",
"|Thursday|12:32|Rain|*22.3*|\n",
"\n",
"It is also easy to typeset good looking equations inline, such as $f(x) = x^2$, or on a line by itself.\n",
"\\begin{equation}\n",
" g(x) = \\sum_{i=1}^n \\frac{\\prod_{j=1}^d y_j \\sqrt{3x_i^4}}{f(x_i)}\n",
"\\end{equation}\n",
"If you use a symbol often, you can define it at the top of a document as follows (look at source), and use it in equations.\n",
"\n",
"$\\newcommand{\\amazing}{\\sqrt{3x_i^4}}$\n",
"\n",
"\\begin{equation}\n",
" h(x) = \\sum_{i=1}^n \\amazing\n",
"\\end{equation}\n",
"\n",
"## Computational code\n",
"\n",
"Setting up python environment ([do not use pylab](http://carreau.github.io/posts/10-No-PyLab-Thanks.ipynb.html))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import scipy as sp\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some resources:\n",
"* [Tentative NumPy Tutorial](http://wiki.scipy.org/Tentative_NumPy_Tutorial)\n",
"* [SciPy Tutorial](http://docs.scipy.org/doc/scipy/reference/tutorial/)\n",
"* [Matplotlib PyPlot Tutorial](http://matplotlib.org/1.3.1/users/pyplot_tutorial.html)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Toy dataset for debugging\n",
"\n",
"Write a function ```gen_data``` that generates data from two Gaussians with unit variance, centered at $\\mathbf{1}$ and $-\\mathbf{1}$ respectively. $\\mathbf{1}$ is the vector of all ones.\n",
"\n",
"*Hint: use ```np.ones``` and ```np.random.randn```*\n",
"\n",
"Use the function to generate 100 samples from each Gaussian, with a 5 dimensional feature space."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# replace this with your solution, add and remove code and markdown cells as appropriate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use ```gen_data``` to generate 30 samples from each Gaussian, with a 2 dimensional feature space. Plot this data.\n",
"\n",
"Discuss:\n",
"- Can you see two bumps?\n",
"- Does the data look Gaussian?\n",
"- What happens with more dimensions?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# replace this with your solution, add and remove code and markdown cells as appropriate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading the writing CSV\n",
"\n",
"Write a file containing the data to a csv file. Confirm that you can read this data using python and also manually inspect the file with a text editor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# replace this with your solution, add and remove code and markdown cells as appropriate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}