Neural network with hidden layer

Posted April 11, 2019 by Rokas Balsys



Neural network model:

Welcome to another tutorial. In last tutorial series we wrote logistic regression function, now it's time to build our first neural network, which will have one hidden layer. You will see that there is no big difference between this model and the one we implemented using logistic regression.

• Define the model structure (data shape).
• Initialize model's parameters.
• Create loop to:
  - Implement forward propagation
  - Compute loss
  - Implement backward propagation to get the gradients
  - Update parameters (gradient descent)

We often build helper functions to compute 3 first steps and then merge them into one function we will call nn_model(). Once we'll have built nn_model() and learnt the right parameters, we'll make predictions.

Model architecture:

You would agree that we can't get to good results with Logistic, we can train our model as long as we wan't but it won't improove. So we going to train a Neural Network with a single hidden layer:

Network-model.jpg

Mathematical expression of forward propagation algorithm for one example $x^{(i)}$: $$z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1]}\tag{1}$$ $$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$ $$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]}\tag{3}$$ $$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$ $$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$
And our final forward propagation cost function will look like this: $$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small \tag{6}$$

Initialize the model's parameters:

Now, while we'll have one hidden layer in our model, we'll need to initialize parameters for input and hidden layers. Now our parameters can't be zeros from start, because hidden layer parameters depends from input layers, so we are initializing them as very small random numbers. If our weight would be zeros, in every training iteration they would be all with same value, but when we initialize them as different random numbers, they train differently. But bias can be zeros:

def initialize_parameters(input_layer, hidden_layer, output_layer):
    # initialize 1st layer output and input with random values
    W1 = np.random.randn(hidden_layer, input_layer) * 0.01
    # initialize 1st layer output bias
    b1 = np.zeros((hidden_layer, 1))
    # initialize 2nd layer output and input with random values
    W2 = np.random.randn(output_layer, hidden_layer) * 0.01
    # initialize 2nd layer output bias
    b2 = np.zeros((output_layer,1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

Before computing forward propagation we'll cover tanh function.

Same as in our logistic regression where we visualised our sigmoid and sigmoid_derivative functions and generated data from -10 to 10, we'll use same code for our tanh visualisation. Below is full code used to print tanh and tanh_derivative functions:

from matplotlib import pylab
import pylab as plt
import numpy as np

def tanh_derivative(x):
    ds = 1 - np.power(np.tanh(x), 2)
    return ds

# linespace generate an array from start and stop value, 100 elements
values = plt.linspace(-10,10,100)

# prepare the plot, associate the color r(ed) or b(lue) and the label 
plt.plot(values, np.tanh(values), 'r')
plt.plot(values, tanh_derivative(values), 'b')

# Draw the grid line in background.
plt.grid()

# Title & Subtitle
plt.title('Sigmoid and Sigmoid derivative functions')

# plt.plot(x)
plt.xlabel('X Axis')
plt.ylabel('Y Axis')

# create the graph
plt.show()

As a result we'll receive following graph:

tanh-and-tanh-derivative-functions

Above curve in red is plot of our tanh function and curve in red color is our tanh_derivative function.

Full tutorial code:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import scipy

ROWS = 64
COLS = 64
CHANNELS = 3

#TRAIN_DIR = 'Train_data/'
#TEST_DIR = 'Test_data/'

#train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)]
#test_images =  [TEST_DIR+i for i in os.listdir(TEST_DIR)]

def read_image(file_path):
    img = cv2.imread(file_path, cv2.IMREAD_COLOR)
    return cv2.resize(img, (ROWS, COLS), interpolation=cv2.INTER_CUBIC)

def prepare_data(images):
    m = len(images)
    X = np.zeros((m, ROWS, COLS, CHANNELS), dtype=np.uint8)
    y = np.zeros((1, m))
    for i, image_file in enumerate(images):
        X[i,:] = read_image(image_file)
        if 'dog' in image_file.lower():
            y[0, i] = 1
        elif 'cat' in image_file.lower():
            y[0, i] = 0
    return X, y

def sigmoid(z):
    s = 1/(1+np.exp(-z))
    return s
'''
train_set_x, train_set_y = prepare_data(train_images)
test_set_x, test_set_y = prepare_data(test_images)

train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], ROWS*COLS*CHANNELS).T
test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T

train_set_x = train_set_x_flatten/255
test_set_x = test_set_x_flatten/255
'''

def initialize_parameters(input_layer, hidden_layer, output_layer):
    # initialize 1st layer output and input with random values
    W1 = np.random.randn(hidden_layer, input_layer) * 0.01
    # initialize 1st layer output bias
    b1 = np.zeros((hidden_layer, 1))
    # initialize 2nd layer output and input with random values
    W2 = np.random.randn(output_layer, hidden_layer) * 0.01
    # initialize 2nd layer output bias
    b2 = np.zeros((output_layer,1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In this tutorial part we initialized our model's parameters and visualized tanh function. In next tutorial we'll start building our forward propagation function and you'll see that these functions are not that different from functions we used in logistic regression tutorial series.