Understanding Logistic Regression

Posted March 25, 2019 by Rokas Balsys

Sigmoid and Sigmoid derivative

To learn about Logistic Regression, at first we need to learn Logistic Regression basic properties, and only then we will be able to build a machine learning model on a real-world application. So we will do everything step by step.

Classification techniques are an essential part of machine learning and data mining applications. Most problems in Data Science are classification problems. There are a lot of classification problems that are available, but the logistics regression is a common and is a useful regression method for solving the binary classification problems.

Logistic Regression can be used for various classification problems such as spam detection, prediction if customer will purchase a particular product or he will choose another competitor, whether the user will click on a given advertisement link or not, and there may be many more examples.

Logistic Regression is one of the most simple and commonly used Machine Learning algorithms for two-class classification. It is easy to implement and can be used as the baseline for any binary classification problem. Its basic fundamental concepts are also constructive in deep learning. Logistic regression describes and estimates the relationship between one dependent binary variable and independent variables.

Numpy is the main and the most used package for scientific computing in Python. It is maintained by a large community (www.numpy.org). In this tutorial we will learn several key numpy functions such as np.exp and np.reshape. You will need to know how to use these functions for future deep learning tutorials.

At first we must learn implement sigmoid function. It is a logistic function which gives an ‘S’ shaped curve that can take any real-valued number and map it into a value between 0 and 1. If the curve goes to positive infinity, y predicted will become 1, and if the curve goes to negative infinity, y predicted will become 0. If the output of the sigmoid function is more than 0.5, we can classify the outcome as 1 or YES, and if it is less than 0.5, we can classify it as 0 or NO. For example: If the output is 0.75, we can say in terms of probability that there is a 75 percent chance that patient will suffer from cancer.

Before using np.exp(), you will use math.exp() to implement the sigmoid function. You will then see why np.exp() is preferable to math.exp(). So here is sigmoid activation function: $$f(x) = \frac{1}{1+e^{-x}}\tag{1}$$

Sigmoid is also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning. Sigmoid function curve looks like a S-shape:


Lets write the code to see an example with math.exp().

import math

def basic_sigmoid(x):
    s = 1/(1+math.exp(-x))
    return s

Lets try to run above function: basic_sigmoid(1). As a result we should receive "0.7310585786300049".

Actually, we rarely use the "math" library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why numpy is more powerful and useful. For example if we'll try to run a list into above function:

x = [1, 2, 3]

If we'll run above code, we will receive an error about bad operand. If we would like to run such lists, we would need to modify our basic_sigmoid() function, inserting for loop, but then we will loose computation power. So in this case, we'll write sigmoid function to use vectors instead of scalars:

import numpy as np

def sigmoid(x):
    s = 1/(1+np.exp(-x)) 
    return s

x could now be either a real number, a vector, or a matrix. Data structures we use in numpy to represent these shapes are vectors or matrices called numpy arrays. Now we can call for example this function:

x = np.array([3, 2, 1])

As a result we should see: "[0.95257413 0.88079708 0.73105858]".

When constructing Neural Network (NN) models, one of the primary considerations is choosing activation functions for hidden and output layers that are differentiable. This is because calculating the backpropagated error signal that is used to determine NN parameter updates requires the gradient of the activation function gradient. Three of the most commonly-used activation functions used in NNs are the relu function, the logistic sigmoid function, and tangent function. Simply talking we need to compute gradients to optimize loss functions using back propagation. Let's code your first sigmoid gradient function: $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$ We'll code above function in two steps:
1. Set s to be the sigmoid of x. we'll use sigmoid(x) function.
2. Then we compute \(\sigma'(x) = s(1-s)\):

import numpy as np

def sigmoid_derivative(x):
    s = sigmoid(x)
    ds = s*(1-s)
    return ds

Above we compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x. We can store the output of the sigmoid function into variables and then use it to calculate the gradient.
Lets test our code:

x = np.array([3, 2, 1])

As a result we receive "[0.04517666 0.10499359 0.19661193]"

To visualise our sigmoid and sigmoid_derivative functions we can generate data from -10 to 10 and use matplotlib to plot these functions. Below is full code used to print sigmoid and sigmoid_derivative functions:

from matplotlib import pylab
import pylab as plt
import numpy as np

def sigmoid(x):
    s = 1/(1+np.exp(-x))   
    return s

def sigmoid_derivative(x):
    s = sigmoid(x)
    ds = s*(1-s)
    return ds

# linespace generate an array from start and stop value, 100 elements
values = plt.linspace(-10,10,100)

# prepare the plot, associate the color r(ed) or b(lue) and the label 
plt.plot(values, sigmoid(values), 'r')
plt.plot(values, sigmoid_derivative(values), 'b')

# Draw the grid line in background.

# Title & Subtitle
plt.title('Sigmoid and Sigmoid derivative functions')

# plt.plot(x)
plt.xlabel('X Axis')
plt.ylabel('Y Axis')

# create the graph

As a result we receive following graph:


Above curve in red is plot of our sigmoid function and curve in red color is our sigmoid_derivative function.

In this tutorial we reviewed sigmoid activation functions used in neural network literature and sigmoid derivative calculation. Note that there are also many other options for activation functions not covered here: e.g. tanh, relu, softmax, etc. Before writing Logistic Regression classification code, we still need to cover what is array reshaping, rows normalization, broadcasting and vectorization, we will cover them in our second tutorial.