Logistic regression is a binary classification method. After being trained on images of cats and dogs in this full tutorial and then being given a picture that it has never seen before of a cat (y=0) or a dog (y=1), we will find out if a machine can predict the correct type. As we’ll see, even a simple algorithm like logistic regression can do this task surprisingly well.
We want to create a model that can take in any number of inputs and constrain the output to be between 0 and 1. You would ask, how will we do that? The answer is simple - one neuron, armed with the sigmoid activation function. Using one neuron, there is a way to measure how good our predicted output is compared to the true label.
This means that we are planning to build a Logistic Regression Model with a Neural Network mindset.
In this tutorial, I will step you through how to do this, and so I will also hone your intuitions about deep learning.
In this tutorial, we will build the general architecture of a learning algorithm, including parameters initialization, cost function, and its gradient calculation, using an optimization algorithm (gradient descent). After that, we will gather all three functions above into the main model function in the right order. All in all, we will not use loops (for/while) in your code unless this will be necessary to do so to maximize our code performance.
First, import all the packages that you will need during this assignment:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import scipy
For this tutorial, I will use the free Dogs vs. Cats Kaggle dataset. You can download it here: LINK. I will not use all the images because it's a too large dataset for our model. For my example, I will use 3000 cats and 3000 dogs images for the training model. To test the model, I will use 500 cats and 500 dogs images. You can download my dataset from here: LINK.
At first, we should define our image size. Because images in our dataset are of different sizes, we will need to resize them to some specific size. I choose 64px for my tutorial. Then I define my training and test image locations. And in the final step, I read all my image locations from both train and test datasets and place them as a list:
ROWS = 64
COLS = 64
CHANNELS = 3
TRAIN_DIR = 'Train_data/'
TEST_DIR = 'Test_data/'
# use this to read full dataset
train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)]
test_images = [TEST_DIR+i for i in os.listdir(TEST_DIR)]
Next, we need to create a simple function, which we'll use to read every image from our file path, resize and return it for future use:
def read_image(file_path):
img = cv2.imread(file_path, cv2.IMREAD_COLOR)
return cv2.resize(img, (ROWS, COLS), interpolation=cv2.INTER_CUBIC)
Now we'll create a function to prepare our data for further use. In the following function, we will separate data, so if our image is with a dog, we will give an index of 1, and if there will be a cat, we will give an index of 0:
def prepare_data(images):
m = len(images)
X = np.zeros((m, ROWS, COLS, CHANNELS),dtype=np.uint8)
y = np.zeros((1, m))
for i, image_file in enumerate(images):
X[i,:] = read_image(image_file)
if 'dog' in image_file.lower():
y[0, i] = 1
elif 'cat' in image_file.lower():
y[0, i] = 0
return X, y
Now, let's call our created function to read all test and train images we have in our folders:
train_set_x, train_set_y = prepare_data(train_images)
test_set_x, test_set_y = prepare_data(test_images)
Now we should reshape images of shape (ROWS, COLS, CHANNELS) in a NumPy array of shape (ROWS ∗ COLS ∗ CHANNELS, 1). After this, our training (and test) datasets will represent a flattened image:
train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], CHANNELS*COLS*ROWS).T
test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T
Let's print our data shapes:
print("train_set_x shape " + str(train_set_x.shape))
print("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print("train_set_y shape: " + str(train_set_y.shape))
print("test_set_x shape " + str(test_set_x.shape))
print("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print("test_set_y shape: " + str(test_set_y.shape))
After printing our shapes we should see them:
train_set_x shape (6002, 64, 64, 3)
train_set_x_flatten shape: (12288, 6002)
train_set_y shape: (1, 6002)
test_set_x shape (1000, 64, 64, 3)
test_set_x_flatten shape: (12288, 1000)
test_set_y shape: (1, 1000)
To represent color images, the red, green, and blue channels (RGB) must be specified for each pixel, and so the pixel value is actually a vector of three numbers ranging from 0 to 255.
One common preprocessing step in machine learning is to center and standardize our dataset, meaning that we subtract the mean of the whole NumPy array from each example and then divide each example by the standard deviation of the whole NumPy array. But for picture datasets, it is simpler and more convenient and works almost as well to divide every row of the dataset by 255 (the maximum value of a pixel channel):
train_set_x = train_set_x_flatten/255
test_set_x = test_set_x_flatten/255
Full tutorial code:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import scipy
ROWS = 64
COLS = 64
CHANNELS = 3
TRAIN_DIR = 'Train_data/'
TEST_DIR = 'Test_data/'
train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)]
test_images = [TEST_DIR+i for i in os.listdir(TEST_DIR)]
def read_image(file_path):
img = cv2.imread(file_path, cv2.IMREAD_COLOR)
return cv2.resize(img, (ROWS, COLS), interpolation=cv2.INTER_CUBIC)
def prepare_data(images):
m = len(images)
X = np.zeros((m, ROWS, COLS, CHANNELS), dtype=np.uint8)
y = np.zeros((1, m))
for i, image_file in enumerate(images):
X[i,:] = read_image(image_file)
if 'dog' in image_file.lower():
y[0, i] = 1
elif 'cat' in image_file.lower():
y[0, i] = 0
return X, y
train_set_x, train_set_y = prepare_data(train_images)
test_set_x, test_set_y = prepare_data(test_images)
train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], ROWS*COLS*CHANNELS).T
test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T
print("train_set_x shape " + str(train_set_x.shape))
print("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print("train_set_y shape: " + str(train_set_y.shape))
print("test_set_x shape " + str(test_set_x.shape))
print("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print("test_set_y shape: " + str(test_set_y.shape))
train_set_x = train_set_x_flatten/255
test_set_x = test_set_x_flatten/255
Conclusion:
Now we know how to prepare our image dataset. In the next tutorial, we will create a general architecture of the learning algorithm, and we will start building functions of it.