Deep Neural Networks step by step
Posted May 3, 2019 by Rokas Balsys
LModel Backward module:
In this part we will implement the backward function for the whole network. Recall that when we implemented the L_model_forward function, at each iteration, we stored a cache which contains (X, W, b, and z). In the back propagation module, we will use those variables to compute the gradients. Therefore, in the L_model_forward function, we will iterate through all the hidden layers backward, starting from layer $L$. On each step, we will use the cached values for layer $l$ to backpropagate through layer $l$. Figure below shows the backward pass.
To backpropagate through this network, we know that the output is, $A^{[L]} = \sigma(Z^{[L]})$. Our code needs to compute $dAL = \frac{\partial \mathcal{L}}{\partial A^{[L]}}$. To do so, we'll use this formula (derived using calculus which you don't need to remember):
# derivative of cost with respect to AL dAL =  (np.divide(Y, AL)  np.divide(1  Y, 1  AL))
We can then use this postactivation gradient dAL to keep going backward. As seen in figure above, we can now feed in dAL into the LINEAR>SIGMOID backward function we implemented (which will use the cached values stored by the L_model_forward function). After that, we will have to use a for loop to iterate through all the other layers using the LINEAR>RELU backward function. We should store each dA, dW, and db in the grads dictionary. To do so, we'll use this formula: $$grads["dW" + str(l)] = dW^{[l]}$$ For example, for $l=3$ this would store $dW^{[l]}$ in $grads["dW3"]$.
Code for our linear_backward function:
Arguments:
AL  probability vector, output of the forward propagation (L_model_forward()).
Y  true "label" vector (containing 0 if noncat, 1 if cat).
caches  list of caches containing:

every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L1) i.e l = 0...L2).
the cache of linear_activation_forward() with "sigmoid" (it's caches[L1]).
Return:
grads  A dictionary with the gradients:

grads["dA" + str(l)] = ...
grads["dW" + str(l)] = ...
grads["db" + str(l)] = ...
def L_model_backward(AL, Y, caches): grads = {} # the number of layers L = len(caches) m = AL.shape[1] # after this line, Y is the same shape as AL Y = Y.reshape(AL.shape) # Initializing the backpropagation dAL =  (np.divide(Y, AL)  np.divide(1  Y, 1  AL)) # Lth layer (SIGMOID > LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL1"], grads["dWL"], grads["dbL"] current_cache = caches[L1] grads["dA" + str(L1)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, "sigmoid") # Loop from l=L2 to l=0 for l in reversed(range(L1)): # lth layer: (RELU > LINEAR) gradients. # Inputs: "grads["dA" + str(l + 1)], current_cache". # Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] current_cache = caches[l] dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA"+str(l+1)], current_cache, "relu") grads["dA" + str(l)] = dA_prev_temp grads["dW" + str(l + 1)] = dW_temp grads["db" + str(l + 1)] = db_temp return grads
Update Parameters module:
In this section we will update the parameters of the model, using gradient descent: $$ W^{[l]} = W^{[l]}  \alpha \text{ } dW^{[l]}$$ $$ b^{[l]} = b^{[l]}  \alpha \text{ } db^{[l]}$$ here $\alpha$ is the learning rate. After computing the updated parameters, we'll store them in the parameters dictionary.
Code for our update_parameters function:
Arguments:
parameters  python dictionary containing our parameters.
grads  python dictionary containing our gradients, output of L_model_backward.
Return:
parameters  python dictionary containing our updated parameters:

parameters["W" + str(l)] = ...
parameters["b" + str(l)] = ...
def update_parameters(parameters, grads, learning_rate): # number of layers in the neural network L = len(parameters) // 2 # Update rule for each parameter for l in range(L): parameters["W" + str(l+1)] = parameters["W" + str(l+1)]  learning_rate*grads["dW" + str(l+1)] parameters["b" + str(l+1)] = parameters["b" + str(l+1)]  learning_rate*grads["db" + str(l+1)] return parameters
Congrats on implementing all the functions required for building a deep neural network. It was a long tutorial but going forward it will only get better. In the next part we'll put all these together to build An Llayer neural network (deep). In fact we'll use these models to classify cat vs dog images.