Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
MCTS (among other things) for 2048

2048 Created by Chad Palmer for CPSC 474, Fall 2021 Overview: This is an application which can play 2048 and simulate games of 2048 with a variety of

Chad Palmer 1 Dec 16, 2021
Just to play with my kids: create a secret alphabet and exchange encrypted messages

Secret Alphabet Description This project allows you to randomly generate an alphabet (a set of characters) and its corresponding translation. For the

BS 1 Nov 12, 2021
For educational purposes, a simple script that assists in solving the word game Wordle.

WordleSolver For educational purposes, a simple script that assists in solving the word game Wordle. Instructions Pick your first word from the sugges

Christian De Leon 2 Mar 25, 2022
Turtle Road Crossing Game in Turtle(python module)

Turtle Road Crossing Game in Turtle(python module) In this project we have built a road crossin game in python with Object-Oriebted Programming. This

Jhenil Parihar 3 Jun 15, 2022
🕹️ Jeu Azul en Python avec 4 IAs 🤖 implémentées, jouable de 1 à 4 joueurs

Projet jeu Azul 🕹️ Jeu Azul en Python avec 4 IAs 🤖 implémentées, jouable de 1 à 4 joueurs Par : Berachem MARKRIA et Tristan MARTINEZ Projet réalisé

Berachem Markria 2 Jun 07, 2022
Wordle-Python - A simple low-key clone of the popular game WORDLE made with python and a 2D Graphics module Pygame

Wordle-Python A simple low-key clone of the popular game WORDLE made with python

Showmick Kar 7 Feb 10, 2022
Guess The Random Number - A sample Random Number Guessing Game Python Program

Guess_The_Random_Number This repo contains a simple "Random Number Guessing Game

Pramod Kumar 3 Feb 09, 2022
Tic tac toe game developed by naman in python

TIC TAC TOE GAME DEVELOPED BY NAMAN IN PYTHON . IT USES MINMAX ALGORITHM TO COMPETE IN DIFFICULTY MODE

Naman Anand 4 Jun 24, 2022
BitBot - A simple shooter game

BitBot BitBot - A simple shooter game This project can be discontinued anytime I want, as it is not a "MAJOR" project for me. Which Game Engine does i

whmsft 1 Jan 04, 2022
Simple implementation of the classic Snake Game in under 100 lines of code

Snake_Game Simple python implementation of the classic Snake Game in under 100 lines of code. Printscreen of the game: Imported Libraries: random; pyg

Raffaele Fiorillo 2 Jun 13, 2022
Warden - Warden guessing game 1

Warden first python project and first posted project sorry for errors warden gue

hasher 3 Jan 09, 2022
Simple game where you try to survive as long as you can on screen.

Survival Game Simple game where you try to survive as long as you can on screen. Play To run, download the code and run the survival_game.py file. Fro

Logan Morris 1 Feb 10, 2022
Multi minecraft server helper for python

呐 Yuki 您的群组服操作小助手。 使用Python3编写。使用 .yaml 配置文件记录子服,配合Screen管理Linux系统上的Minecraft子服,支持MCDR子服与非MCDR子服。 功能: 开启所有子服 关闭所有子服 重载所有子服MCDR 重载所有子服ChatBridge 使用方法:

3 Mar 17, 2022
It calculates the Nim sum of a nim game.

nim-sum It calculates the Nim sum of a nim game. The rules of Nim The traditional game of Nim is played with a number of coins arranged in heaps: the

2 Jan 02, 2022
游戏中按TAB键查看所有玩家分数的小程序

DDNet-show-points-in-game DDRaceNetwork 游戏中按TAB键查看所有玩家分数的小程序

3 Oct 13, 2022
用于 blivechat 的图形界面

blivechat GUI 用于 blivechat 的图形界面。 有朋友在搞 Vtuber,像 blivechat 类似的项目能通过自定义 CSS 的方式在 OBS 上添加一个非常好看的聊天栏。但是想要在桌面端看到弹幕的话得要再开一个浏览器页面,十分不方便。就想写一个背景透明的浮窗浏览器。 挺喜欢

Silence 11 Dec 29, 2022
Rock Paper Scissors Game with PyQt5

Rock-Paper-Scissors-Game rock paper scissors is a old game that all of us played it but this time let's play with computer Description This is Rock Pa

MohammadAli.HBA 4 Nov 11, 2021
Small game I made in 2019 using python/pygame.

Kill-The-Blokk // Shoot or Die This is a small game I made in gr.10 (2019) for my high school computer science class; the game was coded in python usi

1 Nov 13, 2021
Wordle - Wordle Clone With Python

Wordle Clone Python This is a cli clone of the famous wordle game developed by J

Shivam Pandya 20 Jul 07, 2022
Racing Fire - A simple game made with pygame.

Racing Fire A simple game in the making. Using pygame, this game is made to feel like an old arcade game. I developed a simple controller for it with

Builder212 1 Nov 09, 2021