In this lab, we will use linear regression to predict the value of a home and explore the impact of regularisation.

- Maximum likelihood solution to a linear regression problem, with and without regularisation (lectures)
- Matrix calculations in numpy (lab and precourse material)
- Theory behind regularisation (lectures)

- Practical linear regression problems
- Picking an appropriate regularisation parameter for a given problem

$\newcommand{\trace}[1]{\operatorname{tr}\left\{#1\right\}}$ $\newcommand{\Norm}[1]{\lVert#1\rVert}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\DD}{\mathscr{D}}$ $\newcommand{\grad}[1]{\operatorname{grad}#1}$ $\DeclareMathOperator*{\argmin}{arg\,min}$

Setting up the environment

In [ ]:

```
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
```

We will use a dataset on the price of housing in Boston (see description). We aim to predict the value of a home from other factors. In this dataset, each row is one house. The first entry is the value of the house and we will predict it from the remaining values which have been normalised to be in the range $[-1, 1]$. The column labels are

`'medv', 'crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'b', 'lstat'`

Download the dataset. Read in the data using `np.loadtxt`

with the optional argument `delimiter=','`

, as our data is comma separated rather than space separated. Remove the column containing the binary variable `'chas'`

.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Check that the data is as expected using `print()`

. It should have 506 rows (examples) and 13 columns (1 label and 12 features). Check that this is the case.

Hint: use assert.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Implement a **function** to find the maximum likelihood solution $w_{ML}$ assuming Gaussian noise for this linear regression problem. Remember from the lectures that this is equivalent to a linear regresion problem with the cost function set as the sum of squares error.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Use a fifth of the available data for training the model using maximum likelihood. The rest of the data is allocated to the test set. Report the root mean squared error (RMSE) for the training set and the test set. In this cases, use the identity map as the basis function, $\phi(x)=x$.

Note that the data may be sorted or ordered in some way we cannot predict. How will you account for this?

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Find the feature with the biggest magnitude of weight. Using `matplotlib`

(docs for `matplotlib.pyplot.plot`

), create a plot of this feature against the label for the datapoints in the training set. In a different colour, plot this feature against the predicted label. Create a similar plot for the test data.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Implement a **function** to find the maximum likelihood solution $w_{reg}$ for some regularisation parameter $\lambda > 0$ assuming Gaussian noise for this linear regression problem.

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

By calculating the RMSE on the training and test sets, evaluate the impact of regularisation for $\lambda = 1.1$.

What is the effect of regularisation?

*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

You will now explore picking a good regularisation parameter.

What would you expect to see if you were under-regularising (so the parameter was too small)? Over-regularising? Discuss with a partner.

Plot the RMSE on the training and test sets against the regularisation parameter $\lambda$ for a range of values of $\lambda$. What is a good range of values of $\lambda$ to check? What do you think is the best value?

Hint: You may find you want to plot against $\log(\lambda)$. The functions `np.arange`

and `np.linspace`

could be useful here (use whichever you think is more applicable).

*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

We want to use basis functions to improve our performance. Implement subroutines for polynomial basis function of degree 2. See the feature map based on the binomial formula .

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```

Apply this to your train and test sets, and repeat the above exercise with these new features. Report what differences you see.

*--- replace this with your solution, add and remove code and markdown cells as appropriate ---*

In [ ]:

```
# replace this with your solution, add and remove code and markdown cells as appropriate
```