Reinforcement Learning tutorial

Posted February 24 by Rokas Balsys

A.I. learns to play Pong with DQN

There's a huge difference between reading about Reinforcement Learning and actually implementing it. In this tutorial, I'll implement a Deep Neural Network for Reinforcement Learning (Deep Q Network) and we will see it learns and finally becomes good enough to beat the computer in Pong!

By the end of this post, you’ll be able to do the following:
- Write a Neural Network from scratch.
- Implement a Deep Q Network with Reinforcement Learning.
- Build an A.I. for Pong that can beat the computer in less than 300 lines of Python.
- Use OpenAI gym.

Considering limited time and for learning purposes I am not aiming for a perfect trained agent, but i hope this project could help people get familiar with basic process of DQN algorithms and Keras. The following tutorial took 2 days for 4 different agent to learn on a computer with GPU. To obtain production results, a lot of more training and tuning is required which is not my focus.

Tutorial prerequisites: familiarity with Neural Networks, supervised learning, Tensorflow 1.15, Keras 2.2.4, OpenAI gym.

This tutorial is a part of my previous tutorials, I use my previous tutorial code structure - this way it's easier to understand line by line when you start learning from first tutorial part. I should mention that in this tutorial there won't be theory, because we already covered all theory we need in previous tutorials, we will just change several lines of code that our Network would learn to play Pong instead of Cartpole.


1. Open up the code to follow along and copy or clone it.
2. Follow the instructions for installing OpenAI Gym. You may need to install cmake first.
3. Run pip install gym[atari].
4. Let's get to the next part.

We are given the following problems:

1. A sequence of images (frames) representing each frame of the Pong game.
2. An indication when we've won or lost the game.
3. An opponent agent that is the traditional Pong computer player.
4. An agent we control that we can tell to do one step out of 6 at each frame.

Can we use these pieces to train our agent to beat the computer? Moreover, can we make our solution good enough so it could be reused to win in games that aren't Pong?


Indeed, we can! We'll do this by building a Neural Network that takes in each image and outputs a command to our A.I. to take correct move.


We can break this down into the following steps:
1. Take in images from the game and preprocess it (remove color, background, down-sample etc.).
2. Use the Neural Network to compute a probability of taking correct action.
3. Sample from that probability distribution and tell the agent to move up, down or stay in same position.
4. If the round is over (you missed the ball or the opponent missed the ball), find whether you won or lost.
5. When the episode has finished(someone got to 21 points), pass the result through the training algorithm.
6. Repeat this process until our agent is tuned to the point where we can beat the computer. That's basically it! Let’s start looking at how our code achieves this.

OK, now that we’ve described the problem and its solution, let’s get to writing some code (Details in YouTube tutorial)!


You can check full Video tutorial code and download trained models on GitHub or click button bellow to see full code.


So I started training model from the simplest one, simple Deep Q network. Parameters I used to use:

self.ddqn = False # use doudle deep q networks
self.dueling = False # use dealing network
self.USE_PER = False # use priority experienced replay

I thought that this should be trained, and that it will perform the worst, maximum average score it could get was close to 5 scores. But Lets look at other models performance.


Ok, I thought, simple DQN model performed quite nice, I will simply add double network to it self.ddqn = True and lets see how it will work. But what I saw my results were not satisfying me, it performed even worse.


Then I added dueling self.dueling = True network and I thought it should really get better than two above trained models. But results again were quite similar, average score just a little more than zero...


Here was my final hope self.USE_PER = True, because this performed best in our Cartpole game, it should be best here also. But results were even worse:



From this above example results I can say that when the environment for Q function is too complex to be learned, DQN will fail miserably. This is why this is my last DQN tutorial. Policy Gradients is generally believed to be able to apply to a wider range of problems because it directly operates in the policy space. So in coming tutorial I will cover Policy Gradient reinforcement learning.