Neural networks

COMP4670/8600 - Statistical Machine Learning - Tutorial

Setting up the environment

In this lab, we will train a neural network as a classifer.

Textbook Questions

These questions are hand picked to both be of reasonable difficulty and demonstrate what you are expected to be able to solve. The questions are labelled in Bishop as either $\star$, $\star\star$, or $\star\star\star$ to rate its difficulty.

  • Question 5.1: First derive the relationship between $\tanh(\alpha)$ and $\sigma(\alpha)$, then use Eq. 5.4 in Bishop to compare those two neural networks. The key point is having a clear definition of the notations you will use.

  • Question 5.2: Please figure out which parts of Gaussian distribution have parameters when taking the logarithm.

  • Question 5.6: Students should always know where the activation comes from and express the derivative of logistic sigmoid function in terms of the sigmoid function itself.

  • Question 5.15: Use the symmetry between backpropagation and forward propagation.

  • Question 5.25: Challenging but important question. Show you in which case gradient descent really work. First represent the current weight with its previous value and then use mathematical induction to prove Eq. 5.197.

  • Question 5.27: Follow what Section 5.5.5 did.

  • Question 5.29: Students should understand how to take the derivative of Gaussian distribution.

  • Question 5.40: Maping binary classfication domain to multiclass domain is a very practical demand.

Assumed knowledge

  • Neural networks (lectures)
  • Classifiers (lab)

After this lab, you should be comfortable with:

  • Implementing a neural network
  • Calculating back-propogation formulas
In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
%matplotlib inline

Load the data

We will be working with a similar dataset to the one used in the Classification lab. This is a census-income dataset, which shows income levels for people in the 1994 US Census. We will predict whether a person has $\leq \$50000$ or $> \$50000$ income per year.

Unlike in the Classification lab, this data is not linearly separable. That is, the linear classification techniques you learnt about in previous weeks are not effective on this data.

The data is included with this notebook as 05-dataset.tsv. Load the data into a NumPy array called data using numpy genfromtxt function. The column names are given in the variable columns below.

In [ ]:
columns = ['income', 'age', 'education', 'private-work', 'married', 'capital-gain', 'capital-loss', 'hours-per-week']
In [ ]:
data = np.genfromtxt("05-dataset.tsv")
data.shape

In this tutorial we will implement a neural network using only numpy functions.

Building blocks for a neural network

Neural network libraries like PyTorch and TensorFlow seal different functionalities into different classes.

First, implement the fully connected layer which does $\mathbf{y} = X\mathbf{w} + \mathbf{b}$. It is also called Linear layer in PyTorch or Dense layer in TensorFlow.

In [ ]:
class FullyConnectedLayer():
    """
    This is a class skeleton provided.
    It should perform y = Xw + b and its correspongding gradient.
    If you never defined any classes in python before, you probably want to read other tutorials.
    """
    def __init__(self, in_features, out_features):
        """
        This is the init function where you have all the attributes needed defined.
        You don't have to modify any thing in this function but you should read it carefully.
        What each represents will be explained in the next few functions.
        """
        self.in_features = in_features
        self.out_features = out_features
        self.weight = np.zeros((in_features, out_features))
        self.bias = np.zeros((out_features, 1))
        self.g_weight = np.zeros((in_features, out_features))
        self.g_bias = np.zeros((out_features, 1))
        self.input = None

    def init_weights(self):
        """
        Currently, the weight and bias of this layer is initilized to zero, which is terrible.
        You want to re-initilize the weight with standard normal distribution 
        and the bias with uniform distribution defined on range 0 to 1.
        Or you can try different initilization methods.
        After you finish, comment out raise NotImplementedError.
        No return value is needed.
        """
        ###############
        #YOU CODE HERE#
        ###############
        self.weight = None
        self.bias = None
        
        raise NotImplementedError

    def forward_pass(self, X):
        """
        Take the output of last layer as X and return the result.
        Don't forget to save the input X to self.input. You will need the input for gradient calculation.
        After you finish, comment out raise NotImplementedError.
        If you are new to python/numpy, you probably want to figure out what is broadcasting 
        (see http://cs231n.github.io/python-numpy-tutorial/#numpy-broadcasting).
        """
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError
        
        out = None
        
        return out
        

    def backward_pass(self, g_next_layer):
        """
        g_next_layer is the gradient passed from next layer (the layer after current layer in forward pass).
        You need to calculate 3 things.
        First, the gradient with respect to bias, self.g_bias.
        Second, the gradient with respect to weights, self.g_weight.
        Third, the gradient with respect to last layer (the layer formed by the current weight and bias), g_last_layer.
        Save the gradient with respect to bias and the gradient with respect to weight.
        Return the gradient with respect to last layer.
        """
        
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError

        self.g_weight = None
        self.g_bias = None

        g_last_layer = None

        return g_last_layer

    def update(self, learning_rate):
        """
        Update the weight and bias use the gradient that you just calculated.
        No return is needed.
        """
        
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Question 1:

Why is initialising weights and biases to zero terrible?

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Now let's implement sigmoid function and sigmoid layer.

In [ ]:
def sigmoid(X):
    """
    Make sure that you function works with X being matrix.
    Use functions in numpy instead of functions in math.
    """
    ###############
    #YOU CODE HERE#
    ###############
    
    raise NotImplementedError

    return None
In [ ]:
class Sigmoid():
    def __init__(self):
        """
        This is the init function where you have all the attributes needed defined.
        You don't have to modify any thing in this function but you should read it carefully.
        """
        self.input = None

    def forward_pass(self, X):
        """
        Apply sigmoid function to input and save the input for later.
        """
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError
        
        out = None

        return out

    def backward_pass(self, g_next_layer):
        """
        Calculate the gradient with respect to the input.
        g_next_layer is the gradient passed from next layer (the layer after current layer in forward pass).
        Return the gradient with respect to the output of the last layer.
        """
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError

        g_last_layer = None

        return g_last_layer

    def update(self, learning_rate):
        """
        There is no parameter to update for this layer, but we still define this function to maintain a uniform interface.
        """
        pass
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Question 2:

Why do we need the activation function to be non-linear?

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Now let's implement binary cross entropy loss and yes, this is the same loss that you used in logistic regression. Binary cross entropy loss can only deal with two classes. For more than two classes you need softmax function and cross entropy, but let's not worry about that.

In [ ]:
class BinaryCrossEntropyLoss():
    def __init__(self):
        """
        This is the init function where you have all the attributes needed defined.
        You don't have to modify any thing in this function but you should read it carefully.
        """
        self.input_y = None
        self.input_t = None
        self.input_N = None


    def forward_pass(self, y, t):
        """
        y: batch_size * 1  0 <= y <= 1, the predictions
        t: batch_size * 1 , the targets
        (make sure y and t have the same shape. (N,) and (N,1) are different!)
        
        Save the input y, t and batch size N and calculate the loss.
        Return the mean of the loss (a scalar).
        """
        
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError

        loss = None

        return loss
    
    def backward_pass(self, g_next_layer = 1):
        """
        Nomrally, loss layer is the last layer in a neural network. Thus, we set the g_next_layer to 1.
        Calculate the loss with respect to the input y.
        """
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError
        
        g_last_layer = None
        
        return g_last_layer

    def update(self, learning_rate):
        """
        There is no parameter to update for this layer, but we still define this function to maintain a uniform interface.
        """
        pass
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Let's implement a neural network with one hidden layer to solve this classification problem.

Question 3

How many input units would there be? How many output units?

Answer

--- replace this with your solution, add and remove code and markdown cells as appropriate ---

Put them together

Put the bulding block together to form a neural network.

In [ ]:
class Network():
    def __init__(self):
        """
        Since our simple nerual network acts sequentially, 
        we can put all layers in a list for convenient traversal
        Initialize all layers that you need (two fully connected layers, two sigmoid layers).
        Append them to the list in the correct order.
        Don't forget to initilize the weights for fully connected layers. Choose a sensible hidden layer size.
        """
        self.sequential = []
        
        ###############
        #YOU CODE HERE#
        ###############
        
        raise NotImplementedError
        
        
    def forward_pass(self, X):

        for l in self.sequential:

            X = l.forward_pass(X)

        return X

    def backward_pass(self, grad):
        
        for l in reversed(self.sequential):

            grad = l.backward_pass(grad)
            

    def update(self, learning_rate):

        for l in self.sequential:

            l.update(learning_rate)
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate
In [ ]:
np.random.seed(1)
net = Network()
bce = BinaryCrossEntropyLoss()

You have already set up a neural network. In the backward_pass function of each layer, you calculated the gradient of the input and the gradient of the weight. Now, get a pen and a paper and try to replace g_next_layer with the backward_pass function of the next layer, all the way back to binary cross entropy loss. Don't change any code, just write it down.

Compare the results with the text book. I hope you can see how this layer structure naturally implements the chain rule.

Training the neural network

Split your data in half randomly, into a test set and a training set. Train the neural network on your training set. You may want to google train_test_split and accuracy_score of sklearn. If you get any memory error, make sure that all vectors in you calculate have shape (D, 1). (D, 1) and (D,) are different. Don't forget to normalize each feature to get mean 0 and variance 1.

Plot your training accuracy curve againt testing accuracy curve. Plot your training loss curve againt testing loss curve.

In [ ]:
"""
generate your training and testing set here
"""

###############
#YOU CODE HERE#
###############

training_epoch = 1000

train_acc_list = np.zeros(training_epoch)
train_loss_list = np.zeros(training_epoch)
test_acc_list = np.zeros(training_epoch)
test_loss_list = np.zeros(training_epoch)

for i in range(training_epoch):
    """
    This is the main training loop.
    You need to first run a forward pass, get the predicted probability, 
    put it into the loss function, calculate the loss and do a back pass.
    Then update the network.
    Calculate the test loss and the current accuracy on both the train and test sets.
    Save these to a list for plotting later.
    Experiment to find a good learning rate.
    """
    
    ###############
    #YOU CODE HERE#
    ###############

    train_acc = 0    
    train_loss = 0

    grad = None

    train_acc_list[i] = train_acc
    train_loss_list[i] = train_loss

    """
    Calcualte the accuracy and loss on the testing set.
    """
    
    ###############
    #YOU CODE HERE#
    ###############

    test_acc = 0
    test_loss = 0

    test_acc_list[i] = test_acc
    test_loss_list[i] = test_loss

    print("iteration %d: train_acc %f, train_loss %f, test_acc %f, test_loss %f" %(i+1, train_acc, train_loss, test_acc, test_loss))
   
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate
In [ ]:
# replace this with your solution, add and remove code and markdown cells as appropriate

Here is an super cool website:https://playground.tensorflow.org/. Have fun in there.

In [ ]: