Convolutional Neural Networks (CNN) explained

Posted May 16, 2019 by Rokas Balsys



Convolutional Neural Networks in TensorFlow:

Welcome to 4th tutorial part! In this part, we will:

  • Implement helper functions that we will use when implementing a TensorFlow model
  • Implement a fully functioning ConvNet using TensorFlow

After this tutorial we will be able to:

  • Build and train a ConvNet in TensorFlow for a classification problem

TensorFlow model:

In the previous tutorial, we built Deep Neural Networks using TensorFlow. Most practical applications of deep learning today are built using programming frameworks, which have many built-in functions you can simply call.

As usual, we will start by loading in the packages:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import cv2
import numpy as np
import matplotlib.pyplot as plt
import math
import tensorflow as tf
ROWS = 64
COLS = 64
CHANNELS = 3
CLASSES = 2
def read_image(file_path):
    img = cv2.imread(file_path, cv2.IMREAD_COLOR)
    return cv2.resize(img, (ROWS, COLS), interpolation=cv2.INTER_CUBIC)

def prepare_data(images):
    m = len(images)
    X = np.zeros((m, ROWS, COLS, CHANNELS), dtype=np.uint8)
    y = np.zeros((1, m), dtype=np.uint8)
    for i, image_file in enumerate(images):
        X[i,:] = read_image(image_file)
        if 'dog' in image_file.lower():
            y[0, i] = 1
        elif 'cat' in image_file.lower():
            y[0, i] = 0
    return X, y

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y

Run the next cell to load the "Cats vs Dogs" data-set we are going to use:

TRAIN_DIR = 'Train_data/'
TEST_DIR = 'Test_data/'

train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)]
test_images =  [TEST_DIR+i for i in os.listdir(TEST_DIR)]

train_set_x, train_set_y = prepare_data(train_images)
test_set_x, test_set_y = prepare_data(test_images)

X_train = train_set_x/255
X_test = test_set_x/255

Y_train = convert_to_one_hot(train_set_y, CLASSES).T
Y_test = convert_to_one_hot(test_set_y, CLASSES).T

In previous tutorial we had built a fully-connected Deep Network for this dataset. But since this is an image dataset, it is more natural to apply a ConvNet to it. To get started, let's examine the shapes of our data:


print ("number of training examples =", X_train.shape[0])
print ("number of test examples =", X_test.shape[0])
print ("X_train shape:", X_train.shape)
print ("Y_train shape:", Y_train.shape)
print ("X_test shape:", X_test.shape)
print ("Y_test shape:", Y_test.shape)

Output:

number of training examples = 6002
number of test examples = 1000
X_train shape: (6002, 64, 64, 3)
Y_train shape: (6002, 2)
X_test shape: (1000, 64, 64, 3)
Y_test shape: (1000, 2)

1 - Create placeholders:

TensorFlow requires that we create placeholders for the input data that will be fed into the model when running the session.

We will implement the function below to create placeholders for the input image X and the output Y. We should not define the number of training examples for the moment. To do so, we could use "None" as the batch size, it will give us the flexibility to choose it later.

Arguments:
n_H0 - scalar, height of an input image
n_W0 - scalar, width of an input image
n_C0 - scalar, number of channels of the input
n_y - scalar, number of classes

Returns:
X - placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
Y - placeholder for the input labels, of shape [None, n_y] and dtype "float"

def create_placeholders(n_H0, n_W0, n_C0, n_y):

    X = tf.placeholder(tf.float32, shape=(None, n_H0, n_W0, n_C0), name="X")
    Y = tf.placeholder(tf.float32, shape=(None, n_y), name="Y")
    
    return X, Y
X, Y = create_placeholders(ROWS, COLS, CHANNELS, CLASSES)
print ("X = ", X)
print ("Y = ", Y)

Output:

X =  Tensor("X_1:0", shape=(?, 64, 64, 3), dtype=float32)
Y =  Tensor("Y_1:0", shape=(?, 2), dtype=float32)

2 - Initialize parameters:

We will initialize weights/filters $W1$ and $W2$ using `tf.contrib.layers.xavier_initializer()`. We don't need to worry about bias variables as we will soon see that TensorFlow functions take care of the bias. Note also that we will only initialize the weights/filters for the conv2d functions. TensorFlow initializes the layers for the fully connected part automatically. We will talk more about that later.

The dimensions for each group of filters will be: [weight, height, channels, filters]

def initialize_parameters():
    
    W1 = tf.get_variable("W1", [4, 4, 3, 32], initializer = tf.contrib.layers.xavier_initializer())
    W2 = tf.get_variable("W2", [2, 2, 32, 32], initializer = tf.contrib.layers.xavier_initializer())

    parameters = {"W1": W1,
                  "W2": W2}
   
    return parameters
tf.reset_default_graph()
with tf.Session() as sess_test:
    parameters = initialize_parameters()
    init = tf.global_variables_initializer()
    sess_test.run(init)
    print("W1 = ", parameters["W1"].eval()[1,1,1])
    print("W2 = ", parameters["W2"].eval()[1,1,1])

Output:

W1 =  [-0.06732719 -0.04869158 -0.09797626  0.08584268  0.02240836  0.08350148
 -0.08144311  0.03549462  0.09043885 -0.00742315  0.07257762 -0.0955046
  0.06858646 -0.01282825 -0.04284175  0.04028637 -0.06008491 -0.07174417
 -0.02601603  0.06803527 -0.00658079 -0.06387079 -0.06514584  0.08079699
  0.0451208   0.09093665  0.02660077 -0.04926057 -0.01445787  0.01430324
 -0.0094941  -0.00567473]
W2 =  [-0.10164022  0.0088262   0.12092815  0.03691474  0.03909318 -0.11921392
 -0.11299625  0.13281281 -0.00964643  0.07956326  0.10702415 -0.02628222
  0.07636186  0.11758269 -0.07492408 -0.10864092 -0.03752776 -0.15154287
 -0.03635778 -0.07357109 -0.14489271 -0.09165138  0.03596993  0.09336357
  0.11486803  0.12517045 -0.1207674  -0.02368325 -0.04226575  0.06590688
 -0.03436987 -0.11981277]

3 - Forward propagation:

In TensorFlow, there are built-in functions that carry out the convolution steps for us.

  • tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'): given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev).
  • tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = 'SAME'): given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window.
  • tf.nn.relu(Z1): computes the elementwise ReLU of Z1 (which can be any shape).
  • tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k].
  • tf.contrib.layers.fully_connected(F, num_outputs): given a the flattened input F, it returns the output computed using a fully connected layer.

In the last function above (tf.contrib.layers.fully_connected), the fully connected layer automatically initializes weights in the graph and keeps on training them as we train the model. We don't need to initialize those weights when initializing the parameters.

So we will implement the forward_propagation function below to build the following model: CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED.

In detail, we will use the following parameters for all the steps: - Conv2D: stride 1, padding is "SAME"
- ReLU
- Max pool: Use an 8 by 8 filter size and an 8 by 8 stride, padding is "SAME"
- Conv2D: stride 1, padding is "SAME"
- ReLU
- Max pool: Use a 4 by 4 filter size and a 4 by 4 stride, padding is "SAME"
- Flatten the previous output.
- FULLYCONNECTED (FC) layer: We'll apply fully connected layer without an non-linear activation function. We will not call the softmax here. This will result in 2 neurons in the output layer, which then get passed later to a softmax. In TensorFlow, the softmax and cost function are lumped together into a single function, which you'll call in a different function when computing the cost.

Arguments:
X - input dataset placeholder, of shape (input size, number of examples)
parameters - python dictionary containing our parameters "W1", "W2"

Returns:
Z3 - the output of the last LINEAR unit

def forward_propagation(X, parameters):
    # Retrieving the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    W2 = parameters['W2']
    
    # CONV2D: stride of 1, padding 'SAME'
    Z1 = tf.nn.conv2d(X,W1, strides = [1,1,1,1], padding = 'SAME')
    print("Z1 shape", Z1.shape)
    
    # RELU
    A1 = tf.nn.relu(Z1)
    print("A1 shape", A1.shape)
    
    # MAXPOOL: window 8x8, sride 8, padding 'SAME'
    P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
    print("P1 shape", P1.shape)
    
    # CONV2D: filters W2, stride 1, padding 'SAME'
    Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')
    print("Z2 shape", Z2.shape)
    
    # RELU
    A2 = tf.nn.relu(Z2)
    print("A2 shape", A2.shape)
    
    # MAXPOOL: window 4x4, stride 4, padding 'SAME'
    P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
    print("P2 shape", P2.shape)
    
    # FLATTEN
    P2 = tf.contrib.layers.flatten(P2)
    print("P2 FLATTEN shape", P2.shape)
    
    # FULLY-CONNECTED without non-linear activation function (not call softmax).
    # 2 neurons in output layer. Hint: one of the arguments should be "activation_fn=None" 
    Z3 = tf.contrib.layers.fully_connected(P2, CLASSES, activation_fn=None)
    print("Z3 shape", Z3.shape)

    return Z3
tf.reset_default_graph()
with tf.Session() as sess:
    X, Y = create_placeholders(ROWS, COLS, CHANNELS, CLASSES)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    init = tf.global_variables_initializer()
    sess.run(init)
    a = sess.run(Z3, {X: np.random.randn(2,64,64,3), Y: np.random.randn(2,CLASSES)})
    print("Z3 =", a)
    print("Z3 shape =", a.shape)

Output:

Z1 shape (?, 64, 64, 32)
A1 shape (?, 64, 64, 32)
P1 shape (?, 8, 8, 32)
Z2 shape (?, 8, 8, 32)
A2 shape (?, 8, 8, 32)
P2 shape (?, 2, 2, 32)
P2 FLATTEN shape (?, 128)
Z3 shape (?, 2)
Z3 = [[-1.3436915   0.34887427]
 [-1.5181915   0.01192094]]
Z3 shape = (2, 2)

4 - Compute cost:

In TensorFlow, there are built-in functions that carry out the convolution steps for us.

  • tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y): computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss.
  • tf.reduce_mean: computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost.

Arguments:
Z3 - output of forward propagation (output of the last LINEAR unit), of shape (CLASSES, number of examples)
Y - "true" labels vector placeholder, same shape as Z3

Returns:
cost - Tensor of the cost function

def compute_cost(Z3, Y):

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))
    
    return cost
tf.reset_default_graph()
with tf.Session() as sess:
    X, Y = create_placeholders(ROWS, COLS, CHANNELS, CLASSES)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    init = tf.global_variables_initializer()
    sess.run(init)
    a = sess.run(cost, {X: np.random.randn(4,64,64,3), Y: np.random.randn(4,CLASSES)})
    print("cost = ", a)

Output:

Z1 shape (?, 64, 64, 32)
A1 shape (?, 64, 64, 32)
P1 shape (?, 8, 8, 32)
Z2 shape (?, 8, 8, 32)
A2 shape (?, 8, 8, 32)
P2 shape (?, 2, 2, 32)
P2 FLATTEN shape (?, 128)
Z3 shape (?, 2)
cost =  1.4327825

5 - Mini-Batch Gradient descent:

I copied mini-batches function from my last Deep Network TensorFlow tutorial, and adopted it to new data-set shape:

Arguments:
X - input data, of shape (input size, number of examples) (m, Hi, Wi, Ci)
Y - true "label" vector (containing 0 if cat, 1 if dog), of shape (1, number of examples) (m, n_y)
mini_batch_size - size of the mini-batches, integer

Returns:
mini_batches - list of synchronous (mini_batch_X, mini_batch_Y)

def random_mini_batches(X, Y, mini_batch_size = 64):
    # number of training examples
    m = X.shape[0]                  
    mini_batches = []
    
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[permutation,:,:,:]
    shuffled_Y = Y[permutation,:]

    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
        mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    # Handling the end case (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
        mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    return mini_batches

6 - Model:

Finally we will merge the helper functions we implemented above to build a model. We will train it on my Cats and Dogs data-set.

The model below should:

  • create placeholders
  • initialize parameters
  • forward propagate
  • compute the cost
  • create an optimizer

Finally we will create a session and run a for loop for num_epochs, get the mini-batches, and then for each mini-batch we will optimize the function.

So we'll implement a three-layer ConvNet in Tensorflow: CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED.

Arguments:
X_train - training set, of shape (None, ROWS, COLS, CHANNELS)
Y_train - test set, of shape (None, n_y = CLASSES)
X_test - training set, of shape (None, ROWS, COLS, CHANNELS)
Y_test - test set, of shape (None, n_y = CLASSES)
learning_rate - learning rate of the optimization
num_epochs - number of epochs of the optimization loop
minibatch_size - size of a minibatch
print_cost - True to print the cost every 100 epochs

Returns:
train_accuracy - real number, accuracy on the train set (X_train)
test_accuracy - real number, testing accuracy on the test set (X_test)
parameters - parameters learnt by the model. They can then be used to predict.

tf.reset_default_graph()
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,
          num_epochs = 200, minibatch_size = 64, print_cost = True):

    (m, n_H0, n_W0, n_C0) = X_train.shape             
    n_y = Y_train.shape[1] 
    
    # To keep track of the cost
    costs = []                                        
    
    # Createing Placeholders of the correct shape
    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)

    # Initializing parameters
    parameters = initialize_parameters()
    
    # Forward propagation: Building the forward propagation in the tensorflow graph
    Z3 = forward_propagation(X, parameters)
    
    # Cost function: Adding cost function to tensorflow graph
    cost = compute_cost(Z3, Y)
    
    # Backpropagation: Defining the tensorflow optimizer. Using an AdamOptimizer that minimizes the cost.
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    
    # Initializing all the variables globally
    init = tf.global_variables_initializer()
     
    # Starting the session to compute the tensorflow graph
    with tf.Session() as sess:
        
        # Runing the initialization
        sess.run(init)
        
        # Doing the training loop
        for epoch in range(num_epochs):

            minibatch_cost = 0.
            # number of minibatches of size minibatch_size in the train set
            num_minibatches = int(m / minibatch_size)
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size)

            for minibatch in minibatches:

                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).
                _ , temp_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                
                minibatch_cost += temp_cost / num_minibatches
                

            # Print the cost every epoch
            if print_cost == True and epoch % 5 == 0:
                print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))
            if print_cost == True and epoch % 1 == 0:
                costs.append(minibatch_cost)
        
        
        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()
        
        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print ("Parameters have been trained!")     
        
        # Calculate the correct predictions
        predict_op = tf.argmax(Z3, 1)
        correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))
        
        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
        print("Train Accuracy:", train_accuracy)
        print("Test Accuracy:", test_accuracy)
        
        # Saving our trained model
        saver = tf.train.Saver()
        tf.add_to_collection('predict_op', predict_op)
        saver.save(sess, './my-CNN-model')        
                
        return train_accuracy, test_accuracy, parameters

Run the following cell to train your model for 200 epochs:

_, _, parameters = model(X_train, Y_train, X_test, Y_test)

Output:

Z1 shape (?, 64, 64, 32)
A1 shape (?, 64, 64, 32)
P1 shape (?, 8, 8, 32)
Z2 shape (?, 8, 8, 32)
A2 shape (?, 8, 8, 32)
P2 shape (?, 2, 2, 32)
P2 FLATTEN shape (?, 128)
Z3 shape (?, 2)
Cost after epoch 0: 0.701049
Cost after epoch 5: 0.580966
Cost after epoch 10: 0.525494
Cost after epoch 15: 0.509302
...
...
Cost after epoch 195: 0.230628

CNN_cost.png

Parameters have been trained!
Train Accuracy: 0.92952347
Test Accuracy: 0.703


7 - Test with your own image:

When we save the variables, it creates a .meta file. This file contains the graph structure. Therefore, we can import the meta graph using tf.train.import_meta_graph() and restore the values of the graph. Let's import the graph and see all tensors in the graph:

# delete the current graph
tf.reset_default_graph()

# import the graph from the file
imported_graph = tf.train.import_meta_graph('my-CNN-model.meta')

# list all the tensors in the graph
for tensor in tf.get_default_graph().get_operations():
    print (tensor.name)

We can now take a picture of our cat or dog and see the output of our model. To do that:

#test_image = "cat.jpg"
test_image = "dog.jpg"
my_image = read_image(test_image).reshape(1, ROWS, COLS, CHANNELS)
X = my_image / 255.
#print(X.shape)

checkpoint_path = 'my-CNN-model'
tf.reset_default_graph()

with tf.Session() as sess:

    ## Load the entire model previuosly saved in a checkpoint
    print("Load the model from path", checkpoint_path)
    the_Saver = tf.train.import_meta_graph(checkpoint_path + '.meta')
    the_Saver.restore(sess, checkpoint_path)

    ## Identify the predictor of the Tensorflow graph
    predict_op = tf.get_collection('predict_op')[0]

    ## Identify the restored Tensorflow graph
    dataFlowGraph = tf.get_default_graph()

    ## Identify the input placeholder to feed the images into as defined in the model 
    x = dataFlowGraph.get_tensor_by_name("X:0")

    ## Predict the image category
    prediction = sess.run(predict_op, feed_dict = {x: X})

    print("\nThe predicted image class is:", np.squeeze(prediction))


Congratulations! We have finised the tutorial and built a model that recognizes Cat versus Dog with almost 70% accuracy on the test set. If you wish, feel free to play around with this dataset further. You can actually improve its accuracy by spending more time tuning the hyperparameters.

Once again, nice work !
In next tutorial we'll start building CNN in Keras !

Full tutorial code and cats vs dogs image data-set can be found on my GitHub page.