Card image cap
Cartpole DQN
This is introduction tutorial to Reinforcement Learning. To understand everything from basics I will start with simple game called - CartPole.
Card image cap
Cartpole Double DQN
This is second reinforcement tutorial part, where we'll make our environment to use two (Double) neural networks to train our main model.
Card image cap
Cartpole Dueling DDQN
In this post, we’ll be covering Dueling DQN networks for reinforcement learning. This architecture is an improvement of our previous DDQN tutorial.
Card image cap
Epsilon-Greedy DQN
In this part we'll cover Epsilon Greedy method used in Deep Q Learning and we'll fix/prepare our source code for PER method.
Card image cap
Prioritized Experience Replay
Now we will try to change the sampling distribution by using a criterion to define the priority of each tuple of experience.
Card image cap
DQN PER with CNN
Now I will show you how to implement DQN with CNN. After this tutorial, you'll be able to create an agent that successfully plays almost ‘any’ game using only pixel inputs.
Card image cap
Pong with DQN
In this tutorial, I'll implement a Deep Neural Network for Reinforcement Learning (Deep Q Network) and we will see it learns and finally becomes good enough to beat the computer in Pong!
Card image cap
RL agents Beyond DQN
To wrap up deep reinforcement learning, I’ll introduce the types of agents beyond DQN’s (Value, Model, Policy optimization and Imitation Learning). We'll implement Policy Gradient!
Card image cap
Advanced Actor Critic (A2C)
Today, we'll study a Reinforcement Learning method which we can call a 'hybrid method': Actor Critic. This algorithm combines the value optimization and policy optimization approaches.
Card image cap
Asynchronous Actor Critic (A3C)
In this tutorial I will provide an implementation of Asynchronous Advantage Actor-Critic (A3C) algorithm in Tensorflow and Keras. We will use it to solve a simple challenge in Pong environment!
Card image cap
Policy Optimization (PPO)
In this tutorial we'll dive on the understanding of the PPO architecture and we'll implement a Proximal Policy Optimization (PPO) agent that learns to play Pong-v0.