TensorFlow Neural Networks explained

Posted May 14, 2019 by Rokas Balsys

Building our first neural network in tensorflow:

In this tutorial part we will build a deep neural network using tensorflow. Remember that there are two parts to implement a tensorflow model:

  • Create the computation graph
  • Run the graph

In this part we'll use same Cats vs Dogs data-set we used in our previous tutorials. But in this tutorial instead of using sigmoid, we'll use softmax function, so if you wan't you can add more classes to recognize. Also because we'll use softmax we need to change shape of our labels, so I made convert_to_one_hot function, you know what it does if you watched my previous tutorial. Adding to this, our used data-shape will be a little different than we used before, everything is in these following lines:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
ROWS = 64
COLS = 64
def read_image(file_path):
    img = cv2.imread(file_path, cv2.IMREAD_COLOR)
    return cv2.resize(img, (ROWS, COLS), interpolation=cv2.INTER_CUBIC)

def prepare_data(images):
    m = len(images)
    X = np.zeros((m, ROWS, COLS, CHANNELS), dtype=np.uint8)
    y = np.zeros((1, m), dtype=np.uint8)
    for i, image_file in enumerate(images):
        X[i,:] = read_image(image_file)
        if 'dog' in image_file.lower():
            y[0, i] = 1
        elif 'cat' in image_file.lower():
            y[0, i] = 0
    return X, y

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y
TRAIN_DIR = 'Train_data/'
TEST_DIR = 'Test_data/'

train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)]
test_images =  [TEST_DIR+i for i in os.listdir(TEST_DIR)]

train_set_x, train_set_y = prepare_data(train_images)
test_set_x, test_set_y = prepare_data(test_images)

train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], ROWS*COLS*CHANNELS).T
test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T

X_train = train_set_x_flatten/255
X_test = test_set_x_flatten/255

Y_train = convert_to_one_hot(train_set_y, CLASSES)
Y_test = convert_to_one_hot(test_set_y, CLASSES)

As usual we flatten the image dataset, then normalize it by dividing by 255. On top of that, we will convert each label to a one-hot vector:

print ("number of training examples =", X_train.shape[1])
print ("number of test examples =", X_test.shape[1])
print ("X_train shape:", X_train.shape)
print ("Y_train shape:", Y_train.shape)
print ("X_test shape:", X_test.shape)
print ("Y_test shape:", Y_test.shape)


number of training examples = 6002
number of test examples = 1000
X_train shape: (12288, 6002)
Y_train shape: (2, 6002)
X_test shape: (12288, 1000)
Y_test shape: (2, 1000)

Note that 12288 comes from $64 \times 64 \times 3$. Each image is square, 64 by 64 pixels, and 3 is for the RGB colors. Please make sure all these shapes make sense to you before continuing.

Our goal is to build an algorithm capable of recognizing a cat and dog with high accuracy. To do so, we are going to build a tensorflow model that is almost the same as one we have previously built in numpy for cat recognition (but now using a softmax output). It is a great occasion to compare your numpy implementation to the tensorflow one. You'll see that training of model on tensorflow is significant faster.

The model will be LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX. The SIGMOID output layer has been converted to a SOFTMAX. A SOFTMAX layer generalizes SIGMOID to when there are more than two classes.

1 - Create placeholders:

Our first task is to create placeholders for X and Y`. This will allow us later ta pass our training data in when we run our session.

Below is function to create the placeholders in tensorflow:

n_x - scalar, size of an image vector (num_px * num_px = ROWS * COLS * CHANNELS = 12288)
n_y - scalar, number of classes (from 0 to 1, so -> 2)

X - placeholder for the data input, of shape [n_x, None] and dtype "float"
Y - placeholder for the input labels, of shape [n_y, None] and dtype "float"

We will use None because it let's us be flexible on the number of examples we will use for the placeholders.
In fact, the number of examples during test/train is different.

def create_placeholders(n_x, n_y):

    X = tf.placeholder(tf.float32, shape=(n_x, None), name = 'X')
    Y = tf.placeholder(tf.float32, shape=(n_y, None), name = 'Y')
    return X, Y
X, Y = create_placeholders(X_train.shape[0], CLASSES)
print ("X = " + str(X))
print ("Y = " + str(Y))


X = Tensor("X_1:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Y_1:0", shape=(2, ?), dtype=float32)

2 - Initializing the parameters:

Our second task is to initialize the parameters in tensorflow.

We are going use Xavier Initialization for weights and Zero Initialization for biases.

INPUT, h1, h2, OUTPUT - size of model layers

parameters - a dictionary of tensors containing W1, b1, W2, b2, W3, b3

def initialize_parameters(INPUT, h1, h2, OUTPUT):
    W1 = tf.get_variable("W1", [h1, INPUT], initializer = tf.contrib.layers.xavier_initializer())
    b1 = tf.get_variable("b1", [h1, 1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [h2, h1], initializer = tf.contrib.layers.xavier_initializer())
    b2 = tf.get_variable("b2", [h2, 1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [OUTPUT, h2], initializer = tf.contrib.layers.xavier_initializer())
    b3 = tf.get_variable("b3", [OUTPUT, 1], initializer = tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    return parameters
with tf.Session() as sess:
    parameters = initialize_parameters(X_train.shape[0], 25, 12, CLASSES)
    print("W1 = ", parameters["W1"])
    print("b1 = ", parameters["b1"])
    print("W2 = ", parameters["W2"])
    print("b2 = ", parameters["b2"])
    print("W3 = ", parameters["W3"])
    print("b3 = ", parameters["b3"])


W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>
W3 = <tf.Variable 'W3:0' shape=(2, 12) dtype=float32_ref>
b3 = <tf.Variable 'b3:0' shape=(2, 1) dtype=float32_ref>

3 - Forward propagation in tensorflow:

We will now implement the forward propagation module in tensorflow. The function will take in a dictionary of parameters and it will complete the forward pass. The functions we will be using are:

  • tf.add(...,...) to do an addition
  • tf.matmul(...,...) to do a matrix multiplication
  • tf.nn.relu(...) to apply the ReLU activation

Note: I commented the numpy equivalents so that you can compare the tensorflow implementation to numpy. It is important to note that the forward propagation stops at z3. The reason is that in tensorflow the last linear layer output is given as input to the function computing the loss. Therefore, you don't need a3!

X - input dataset placeholder, of shape (input size, number of examples)
parameters - python dictionary containing our parameters "W1", "b1", "W2", "b2", "W3", "b3" the shapes are given in initialize_parameters

Z3 - the output of the last LINEAR unit

def forward_propagation(X, parameters):
    # Retrieving parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']

                                      # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1,X),b1)   # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)               # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2,A1),b2)  # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)               # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3,A2),b3)  # Z3 = np.dot(W3,Z2) + b3

    return Z3

with tf.Session() as sess:
    X, Y = create_placeholders(X_train.shape[0], CLASSES)
    parameters = initialize_parameters(X_train.shape[0], 25, 12, CLASSES)
    Z3 = forward_propagation(X, parameters)
    print("Z3 =", Z3)


Z3 = Tensor("Add_2:0", shape=(2, ?), dtype=float32)

You may have noticed that the forward propagation doesn't output any cache. You will understand why soon, when we get to brackpropagation.

4 - Compute cost:

As seen before, it is very easy to compute the cost using:

    tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

  • It is important to know that the "logits" and "labels" inputs of tf.nn.softmax_cross_entropy_with_logits are expected to be of shape (number of examples, num_classes) so I have transposed Z3 and Y for you.
  • Besides, tf.reduce_mean basically does the summation over the examples.

Z3 - output of forward propagation (output of the last LINEAR unit), of shape (CLASSES, number of examples)
Y - "true" labels vector placeholder, same shape as Z3

cost - Tensor of the cost function

def compute_cost(Z3, Y):   
    # fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
    return cost

with tf.Session() as sess:
    X, Y = create_placeholders(X_train.shape[0], CLASSES)
    parameters = initialize_parameters(X_train.shape[0], 25, 12, CLASSES)
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost =",cost)


cost = Tensor("Mean:0", shape=(), dtype=float32)

5 - Mini-Batch Gradient descent:

Let's build mini-batches from the training set (X, Y) in two steps:

  • Shuffle: We'll create a shuffled version of the training set (X, Y) as shown below. Each column of X and Y represents a training example. Note that the random shuffling is done synchronously between X and Y. Such that after the shuffling the $i^{th}$ column of X is the example corresponding to the $i^{th}$ label in Y. The shuffling step ensures that examples will be split randomly into different mini-batches.
  • Partition: We'll partition the shuffled (X, Y) into mini-batches of size mini_batch_size. Note that the number of training examples is not always divisible by mini_batch_size. The last mini batch might be smaller, but you don't need to worry about this. When the final mini-batch is smaller than the full mini_batch_size, it will look like this:

Note that the last mini-batch might end up smaller than mini_batch_size. For example if our mini_batch_size=64 Let $\lfloor s \rfloor$ represents $s$ rounded down to the nearest integer (this is math.floor(s) function in Python). If the total number of examples is not a multiple of mini_batch_size=64 then there will be $\lfloor \frac{m}{mini\_batch\_size}\rfloor$ mini-batches with a full 64 examples, and the number of examples in the final mini-batch will be ($m-mini_\_batch_\_size \times \lfloor \frac{m}{mini\_batch\_size}\rfloor$).

X - input data, of shape (input size, number of examples)
Y - true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
mini_batch_size - size of the mini-batches, integer

mini_batches - list of synchronous (mini_batch_X, mini_batch_Y)

def random_mini_batches(X, Y, mini_batch_size = 64):   
    # number of training examples
    m = X.shape[1]
    mini_batches = []
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation].reshape((Y.shape[0],m))

    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    # number of mini batches of size mini_batch_size in your partitionning
    num_complete_minibatches = math.floor(m/mini_batch_size)
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[:, k * mini_batch_size : k * mini_batch_size + mini_batch_size]
        mini_batch_Y = shuffled_Y[:, k * mini_batch_size : k * mini_batch_size + mini_batch_size]
        mini_batch = (mini_batch_X, mini_batch_Y)
    # Handling the end case (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size : m]
        mini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size : m]
        mini_batch = (mini_batch_X, mini_batch_Y)
    return mini_batches
mini_batches = random_mini_batches(X_train, Y_train, 64)

print ("shape of the 1st mini_batch_X:", mini_batches[0][0].shape)
print ("shape of the 1st mini_batch_Y:", mini_batches[0][1].shape)


shape of the 1st mini_batch_X: (12288, 64)
shape of the 1st mini_batch_Y: (2, 64)

What we should remember:

  • Shuffling and Partitioning are the two steps required to build mini-batches.
  • Powers of two are often chosen to be the mini-batch size, e.g., 16, 32, 64, 128.

6 - Backward propagation & parameter updates:

This is where we become grateful to programming frameworks. All the backpropagation and the parameters update is taken care of in 1 line of code. It is very easy to incorporate this line in the model.

After we compute the cost function. We will create an "optimizer" object. We have to call this object along with the cost when running the tf.session. When called, it will perform an optimization on the given cost with the chosen method and learning rate.

For instance, for gradient descent the optimizer would be:

For instance, for gradient descent the optimizer would be:

    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

To make the optimization we would do:

    _ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

This computes the backpropagation by passing through the tensorflow graph in the reverse order. From cost to inputs.

When coding, we often use _ as a "throwaway" variable to store values that we won't need to use later. Here, _ takes on the evaluated value of optimizer, which we don't need (and c takes the value of the cost variable).

7 - Building the model:

Now we will bring it all together! We will be calling the functions we had previously implemented. So we'll implement a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.

X_train - training set, of shape (input size, number of training examples)
Y_train - test set, of shape (output size = CLASSES, number of training examples)
X_test - training set, of shape (input size, number of training examples)
Y_test - test set, of shape (output size = CLASSES, number of test examples)
learning_rate - learning rate of the optimization
num_epochs - number of epochs of the optimization loop
minibatch_size - size of a minibatch
print_cost - True to print the cost every 100 epochs

parameters - parameters learnt by the model. They can then be used to predict.

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
          num_epochs = 800, minibatch_size = 64, print_cost = True):
    # (n_x: input size, m : number of examples in the train set)
    (n_x, m) = X_train.shape                        
    # n_y : output size
    n_y = Y_train.shape[0]                        
    # To keep track of the cost
    costs = []                                        
    # Create Placeholders of shape (n_x, n_y)
    X, Y = create_placeholders(n_x, n_y)

    # Initialize parameters
    parameters = initialize_parameters(X_train.shape[0], 100, 12, CLASSES)
    # Forward propagation: Build the forward propagation in the tensorflow graph
    Z3 = forward_propagation(X, parameters)
    # Cost function: Add cost function to tensorflow graph
    cost = compute_cost(Z3, Y)
    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    # Initialize all the variables
    init = tf.global_variables_initializer()
    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        # Run the initialization
        # Do the training loop
        # for epoch in range(num_epochs): #Remove problem for loop
        epoch = 0                   #My While loop setup
        while epoch < num_epochs:   #My While loop setup
            epoch = epoch + 1       #My While loop setup
            epoch_cost = 0.         # Defines a cost related to an epoch
            # number of minibatches of size minibatch_size in the train set
            num_minibatches = int(m / minibatch_size)
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size)

            for minibatch in minibatches:
                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost == True and epoch == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
        # plot the cost
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))

        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print ("Parameters have been trained!")

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
        # Save our trained model
        saver = tf.train.Saver()
        saver.save(sess, './TensorFlow-first-network')
        return parameters

Run the following cell to train our model. On my machine it takes about 15 minutes, with GPU:

parameters = model(X_train, Y_train, X_test, Y_test)


Cost after epoch 100: 0.149014
Cost after epoch 200: 0.007334
Cost after epoch 300: 0.002902
Cost after epoch 400: 0.000667
Cost after epoch 500: 0.004335
Cost after epoch 600: 0.001176
Cost after epoch 700: 0.001851
Cost after epoch 800: 0.000289
Cost after epoch 900: 0.001448
Cost after epoch 1000: 0.000860

Parameters have been trained!
Train Accuracy: 1.0
Test Accuracy: 0.592

Amazing, our algorithm can recognize a cat and dog with 60% accuracy.

  • Our model seems big enough to fit the training set well. However, given the difference between train and test accuracy, we could try to add L2 or dropout regularization to reduce overfitting.
  • Think about the session as a block of code to train the model. Each time we run the session on a minibatch, it trains the parameters. In total we have run the session a large number of times (800 epochs) until we obtained well trained parameters.
8 - Test with your own image:

We can now take a picture of our cat or dog and see the output of our model. To do that:

#test_image = "cat.jpg"
test_image = "dog.jpg"
my_image = read_image(test_image).reshape(1, ROWS*COLS*CHANNELS).T
X = my_image / 255.

Bellow we use predict function to predict Cat vs Dog from our X image, we use our saved parameters:

def predict(X, parameters):
    W1 = tf.convert_to_tensor(parameters["W1"])
    b1 = tf.convert_to_tensor(parameters["b1"])
    W2 = tf.convert_to_tensor(parameters["W2"])
    b2 = tf.convert_to_tensor(parameters["b2"])
    W3 = tf.convert_to_tensor(parameters["W3"])
    b3 = tf.convert_to_tensor(parameters["b3"])
    params = {"W1": W1,
              "b1": b1,
              "W2": W2,
              "b2": b2,
              "W3": W3,
              "b3": b3}
    x = tf.placeholder("float", [X.shape[0], 1])
    z3 = forward_propagation(x, params)
    p = tf.argmax(z3)
    sess = tf.Session()
    prediction = sess.run(p, feed_dict = {x: X})
    return str(np.squeeze(prediction))
predict(X, parameters)

Output: '1'

9 - Restore the graph from .meta file:

When we save the variables, it creates a .meta file. This file contains the graph structure. Therefore, we can import the meta graph using tf.train.import_meta_graph() and restore the values of the graph. Let's import the graph and see all tensors in the graph:

# delete the current graph

# import the graph from the file
imported_graph = tf.train.import_meta_graph('TensorFlow-first-network.meta')

# list all the tensors in the graph
for tensor in tf.get_default_graph().get_operations():
    print (tensor.name)



tf.train.Saver() saves the variables with the TensorFlow name. Now that we have the imported graph, we know that we are interested in W1... and b1... tensors, we can restore the parameters:

with tf.Session() as sess:

    ## Load the entire model previuosly saved in a checkpoint
    the_Saver = tf.train.import_meta_graph('TensorFlow-first-network' + '.meta')
    the_Saver.restore(sess, './TensorFlow-first-network')

    W1,b1,W2,b2,W3,b3 = sess.run(['W1:0', 'b1:0','W2:0','b2:0', 'W3:0','b3:0'])
    parameters = {"W1": W1,
              "b1": b1,
              "W2": W2,
              "b2": b2,
              "W3": W3,
              "b3": b3}
    #print("W1 = ", parameters["W1"])
    #print("b1 = ", parameters["b1"])
    #print("W2 = ", parameters["W2"])
    #print("b2 = ", parameters["b2"])
    #print("W3 = ", parameters["W3"])
    #print("b3 = ", parameters["b3"])

Now lets try again predict our image, but now we will use parameters restored from our trained model:

predict(X, parameters)

Output: '1'

So I should congratulate you finishing your first deep network with Tensorflow framework.

What we should remember from this tutorial:

1. Tensorflow is a programming framework used in deep learning.
2. The two main object classes in tensorflow are Tensors and Operators.
3. When we code in tensorflow we have to take the following steps:
3.1. Create a graph containing Tensors (Variables, Placeholders ...) and Operations (tf.matmul, tf.add, ...)
3.2. Create a session
3.3. Initialize the session
3.4. Run the session to execute the graph
4. We can execute the graph multiple times as you've seen in model()
5. The backpropagation and optimization is automatically done when running the session on the "optimizer" object.

In next tutorial part instead of using simple deep networks we will use convolutional networks, you will see what difference we get!

Full tutorial code and cats vs dogs image data-set can be found on my GitHub page.