Neural network with hidden layer

Posted April 18, 2019 by Rokas Balsys



Final (OHL) neural networks model

So we came to the last tutorial part where we'll build our final neural network model in nn_model(). For our neural network model we'll use the previous functions in the right order.

At first we'll write a predict function. I'll copy part of prediction code from my logistic regression tutorial. So we'll use forward propagation to predict results.


Coding prediction function:

So we will implement prediction function, but first lets see what are the inputs and outputs to it:

Arguments:
parameters - python dictionary containing our parameters
X - data of size (ROWS * COLS * CHANNELS, number of examples)
Return:
Y_prediction - a numpy array (vector) containing all predictions (0/1) for the examples in X

def predict(parameters, X):
    # Computes probabilities using forward propagation
    Y_prediction = np.zeros((1, X.shape[1]))
    A2, cache = forward_propagation(X, parameters)
    
    for i in range(A2.shape[1]):
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        if A2[0,i] > 0.5:
            Y_prediction[[0],[i]] = 1
        else: 
            Y_prediction[[0],[i]] = 0
    
    return Y_prediction


Coding nn_model() function:

So we will implement final model, but as before, first lets see what are the inputs and outputs to it:

Arguments:
X_train - training set represented by a numpy array of shape (ROWS * COLS * CHANNELS, number of examples)
Y_train - training labels represented by a numpy array (vector) of shape (1, number of examples)
X_test - test set represented by a numpy array of shape (ROWS * COLS * CHANNELS, number of examples)
Y_test - test labels represented by a numpy array (vector) of shape (1, number of examples)
n_h - size of the hidden layer
num_iterations - hyperparameter representing the number of iterations to optimize the parameters
learning_rate - hyperparameter representing the learning rate used in the update rule of optimize()
print_cost - Set to true to print the cost every 200 iterations

Return:
parameters - parameters learnt by the model. They can then be used to predict.

def nn_model(X_train, Y_train, X_test, Y_test, n_h, num_iterations = 1000, learning_rate = 0.05, print_cost=False):
    n_x = X_train.shape[0]
    n_y = Y_train.shape[0]

    # Initialize parameters with nputs: "n_x, n_h, n_y"
    parameters = initialize_parameters(n_x, n_h, n_y)
    
    # Retrieve W1, b1, W2, b2
    W1 = parameters["W1"]
    W2 = parameters["W2"]
    b1 = parameters["b1"]
    b2 = parameters["b2"]

    costs = []
    for i in range(0, num_iterations):
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X_train, parameters)
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y_train, parameters)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X_train, Y_train)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads, learning_rate)
        
        # Print the cost every 200 iterations
        if print_cost and i % 200 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

        # Record the cost
        if i % 100 == 0:
            costs.append(cost)
    
    # Predict test/train set examples
    Y_prediction_test = predict(parameters,X_test)
    Y_prediction_train = predict(parameters,X_train)

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    parameters.update({"costs": costs, "n_h": n_h})
    return parameters

It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layer of 𝑛ℎ hidden units.

parameters = nn_model(train_set_x, train_set_y, test_set_x, test_set_y, n_h = 10, num_iterations = 3000, learning_rate = 0.05, print_cost=True)

Best choice of hidden layers count:

In our logistic regression tutorial we compared results with different learning rates. Neural networks are able to learn even highly non-linear decision boundaries, unlike logistic regression. This time we'll compare the hidden layers count of our model with several choices. Run the code below. Feel free also to try different values than I have initialized:

Note: I modified nn_model function, so it may be different than you can see in video tutorial, because after training model received few errors, so solved them that we could get a cost chart.

hidden_layer = [10, 50, 100, 200, 400]
models = {}
for i in hidden_layer:
    print ("hidden layer is: ",i)
    models[i] = nn_model(train_set_x, train_set_y, test_set_x, test_set_y, n_h = i, num_iterations = 10000, learning_rate = 0.1, print_cost = True)
    print ("-------------------------------------------------------")

for i in hidden_layer:
    plt.plot(np.squeeze(models[i]["costs"]), label= str(models[i]["n_h"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

We'll receive such training and testing results with num_iterations = 2000 and learning_rate = 0.1:

hidden layer is: 10
Cost after iteration 1400: 0.586238
Cost after iteration 1600: 0.572674
Cost after iteration 1800: 0.571317
train accuracy: 74.20859713428857 %
test accuracy: 60.3 %
-------------------------------------------------------
hidden layer is: 50
Cost after iteration 1400: 0.554478
Cost after iteration 1600: 0.528002
Cost after iteration 1800: 0.512501
train accuracy: 70.37654115294902 %
test accuracy: 60.4 %
-------------------------------------------------------
hidden layer is: 100
Cost after iteration 1400: 0.561368
Cost after iteration 1600: 0.530406
Cost after iteration 1800: 0.563748
train accuracy: 70.35988003998668 %
test accuracy: 61.0 %
-------------------------------------------------------
hidden layer is: 200
Cost after iteration 1400: 0.596620
Cost after iteration 1600: 0.550028
Cost after iteration 1800: 0.541246
train accuracy: 69.86004665111629 %
test accuracy: 59.8 %
-------------------------------------------------------
hidden layer is: 400
Cost after iteration 1400: 0.606300
Cost after iteration 1600: 0.577356
Cost after iteration 1800: 0.572363
train accuracy: 71.242919026991 %
test accuracy: 60.4 %
-------------------------------------------------------

Figure_1.png

We'll receive such training and testing results with num_iterations = 10000 and learning_rate = 0.05:

hidden layer is: 10
Cost after iteration 9400: 0.296215
Cost after iteration 9600: 0.285913
Cost after iteration 9800: 0.440895
train accuracy: 82.3558813728757 %
test accuracy: 60.3 %
-------------------------------------------------------
hidden layer is: 50
Cost after iteration 9400: 0.126889
Cost after iteration 9600: 0.186118
Cost after iteration 9800: 0.138445
train accuracy: 94.65178273908697 %
test accuracy: 59.300000000000004 %
-------------------------------------------------------
hidden layer is: 100
Cost after iteration 9400: 0.161640
Cost after iteration 9600: 0.194643
Cost after iteration 9800: 0.105035
train accuracy: 79.35688103965344 %
test accuracy: 59.699999999999996 %
-------------------------------------------------------
hidden layer is: 200
Cost after iteration 9400: 0.113325
Cost after iteration 9600: 0.166675
Cost after iteration 9800: 0.133236
train accuracy: 89.4368543818727 %
test accuracy: 61.3 %
-------------------------------------------------------
hidden layer is: 400
Cost after iteration 9400: 3.607211
Cost after iteration 9600: 0.349736
Cost after iteration 9800: 0.157746
train accuracy: 97.3842052649117 %
test accuracy: 62.8 %
-------------------------------------------------------

Figure_2.png

From these graphs you can see, that we are receiving much better train accuracy than testing, this is because of data overfitting. This means that, it's quite hard for our model to predict animal with data it didn't saw before. We can't do anything better here with one hidden layer neural network, we'll see what we'll receive with deep neural network.

By the way, you can see that our neural network with 400 hidden layers is just 3% better than our logistic regression model, it's not that impressive. We'll see what we can receive with deeper network.

I uplaoded full tutorial code to same GitHub page where I uploaded logistic regression final code, because we use same dataset. After we'll finish our deep neural networks tutorial we'll compare results from all of them.


So we finally finished our another tutorial series about neural networks with one hidden layer. If you tested above code by your self you may say that it's not that different from our logistic regression code. But to teach our model to recognize cats vs dogs takes really long time. And the time needed to train model compared with accuracy is not worth it. So in our next tutorial series we'll start building deep neural networks and we'll refuse to use sigmoid inefficient function.

To get more experience with this model you can test performance on different datasets. Neural netowrks with one hidden layer may work better on task where we don't need to recognize object from images. Moreover you can try playing with learning rate or number of iterations.

See you in a next step by step deep neural networks tutorial.