Solutions of Reinforcement Learning 2nd Edition

Overview

Solutions of Reinforcement Learning 2nd Edition (Original Book by Richard S. Sutton,Andrew G. Barto)

How to contribute and current situation (9/11/2021~)

I have been working as a full-time AI engineer and barely have free time to manage this project any more. I want to make a simple guidance of how to response to contributions:

For exercises that have no answer yet, (for example, chapter 12)

  1. Prepare your latex code, make sure it works and looks somewhat nice.
  2. Send you code to [email protected]. By default, I will put contributer's name in the pdf file, besides the exercise. You can be anoymous as well just noted in the email.
  3. I will update the corresponding solution pdf.

For solution that you think is wrong, but it is trivial to change:

  1. Ask in issues. If there are multiple confirmations and reports to the same issue, I will change the excercise. (the pass rate of such issue is around 30%)

For solution that you think is wrong or incomplete, but it is hard to say that in issue.

Follow the first steps (just as if this exercise has no solution)

I know there is an automatic-ish commit and contribute to pdf procedure, but from the number of contributions, I decide to pass it on. (currently only 2% is contributed by person other than me)

Now I am more concentrated on computer vision and have less time contributing to the interest (RL). But I do hope and think RL is the future subject that will be on the top of AI pyramid one day and I will come back. Thanks for all your supports and best wishes to your own careers.

Those students who are using this to complete your homework, stop it. This is written for serving millions of self-learners who do not have official guide or proper learning environment. And, Of Course, as a personal project, it has ERRORS. (Contribute to issues if you find any).

Welcome to this project. It is a tiny project where we don't do too much coding (yet) but we cooperate together to finish some tricky exercises from famous RL book Reinforcement Learning, An Introduction by Sutton. You may know that this book, especially the second version which was published last year, has no official solution manual. If you send your answer to the email address that the author leaved, you will be returned a fake answer sheet that is incomplete and old. So, why don't we write our own? Most of problems are mathematical proof in which one can learn the therotical backbone nicely but some of them are quite challenging coding problems. Both of them will be updated gradually but math will go first.

Main author would be me and current main cooperater is Jean Wissam Dupin, and before was Zhiqi Pan (quitted now).

Main Contributers for Error Fixing:

burmecia's Work (Error Fix and code contribution)

Chapter 3: Ex 3.4, 3.5, 3.6, 3.9, 3.19

Chapter4: Ex 4.7 Code(in Julia)

Jean's Work (Error Fix):

Chapter 3: Ex 3.8, 3.11, 3.14, 3.23, 3.24, 3.26, 3.28, 3.29, 4.5

QihuaZhong's Work (Error fix, analysis)

Ex 6.11, 5.11, 10.5, 10.6

luigift's Work (Error fix, algorithm contribution)

Ex 10.4 10.6 10.7 Ex 12.1 (alternative solution)

Other people (Error Fix):

Ex 10.2 SHITIANYU-hue Ex 10.6 10.7 Mohammad Salehi

ABOUT MISTAKES:

Don't even expect the solutions be perfect, there are always mistakes. Especially in Chapter 3, where my mind was in a rush there. And, sometimes the problems are just open. Show your ideas and question them in 'issues' at any time!

Let's roll'n out!

UPDATE LOG:

Will update and revise this repo after 2021 April

[UPDATE APRIL 2020] After implementing Ape-X and D4PG in my another project, I will go back to this project and at least finish the policy gradient chapter.

[UPDATE MAR 2020] Chapter 12 almost finished and is updated, except for the last 2 questions. One for dutch trace and one for double expected SARSA. They are tricker than other exercises and I will update them little bit later. Please share your ideas by opening issues if you already hold a valid solution.**

[UPDATE MAR 2020] Due to multiple interviews ( it is interview season in japan ( despite the virus!)), I have to postpone the plan of update to March or later, depending how far I could go. (That means I am doing leetcode-ish stuff every day)

[UPDATE JAN 2020] Future works will NOT be stopped. I will try to finish it in FEB 2020.

[UPDATE JAN 2020] Chapter 12's ideas are not so hard but questions are very difficult. (most chanllenging one in this book ). As far, I have finished up to Ex 12.5 and I think my answer of Ex 12.1 is the only valid one on the internet (or not, challenge welcomed!) But because later half is even more challenging (tedious when it is related to many infiite sums), I would release the final version little bit later.

[UPDATE JAN 2020] Chapter 11 updated. One might have to read the referenced link to Sutton's paper in order to understand some part. Espeically how and why Emphatic-TD works.

[UPDATE JAN 2020] Chapter 10 is long but interesting! Move on!

[UPDATE DEC 2019] Chapter 9 takes long time to read thoroughly but practices are surprisingly just a few. So after uploading the Chapter 9 pdf and I really do think I should go back to previous chapters to complete those programming practices.

Chapter 12

[Updated March 27] Almost finished.

CHAPTER 12 SOLUTION PDF HERE

Chapter 11

Major challenges about off-policy learning. Like Chapter 9, practices are short.

CHAPTER 11 SOLUTION PDF HERE

Chapter 10

It is a substantial complement to Chapter 9. Still many open problems which are very interesting.

CHAPTER 10 SOLUTION PDF HERE

Chapter 9

Long chapter, short practices.

CHAPTER 9 SOLUTION PDF HERE

Chapter 8

Finished without programming. Plan on creating additional exercises to this Chapter because many materials are lack of practice.

CHAPTER 8 SOLUTION PDF HERE

Chapter 7

Finished without programming. Thanks for help from Zhiqi Pan.

CHAPTER 7 SOLUTION PDF HERE

Chapter 6

Fully finished.

CHAPTER 6 SOLUTION PDF HERE

Chapter 5

Partially finished.

CHAPTER 5 SOLUTION PDF HERE

Chapter 4

Finished. Ex4.7 Partially finished. Dat DP question will burn my mind and macbook but I encourage any one who cares nothing about that trying to do yourself. Running through it forces you remember everything behind ordinary DP.:)

CHAPTER 4 SOLUTION PDF HERE

Chapter 3 (I was in a rush in this chapter. Be aware about strange answers if any.)

CHAPTER 3 SOLUTION PDF HERE

Owner
YIFAN WANG
RL & TENSOR Now CV + NLP.
YIFAN WANG
Finetuning Pipeline

KLUE Baseline Korean(한국어) KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper fo

74 Dec 13, 2022
This is the source code of the 1st place solution for segmentation task (with Dice 90.32%) in 2021 CCF BDCI challenge.

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

ISC21-Descriptor-Track-1st The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. You can check our solution

lyakaap 73 Dec 24, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 07, 2022
PyVideoAI: Action Recognition Framework

This reposity contains official implementation of: Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognitio

Kiyoon Kim 22 Dec 29, 2022
Inkscape extensions for figure resizing and editing

Academic-Inkscape: Extensions for figure resizing and editing This repository contains several Inkscape extensions designed for editing plots. Scale P

192 Dec 26, 2022
Machine learning framework for both deep learning and traditional algorithms

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for

NeoML 704 Dec 27, 2022
Denoising images with Fourier Ring Correlation loss

Denoising images with Fourier Ring Correlation loss The python code accompanies the working manuscript Image quality measurements and denoising using

2 Mar 12, 2022
A solution to ensure Crowd Management with Contactless and Safe systems.

CovidTrack A Solution to ensure Crowd Management with Contactless and Safe systems. ML Model Mask Detection Social Distancing Detection Analytics Page

Om Khare 1 Nov 10, 2021
Generative Adversarial Networks(GANs)

Generative Adversarial Networks(GANs) Vanilla GAN ClusterGAN Vanilla GAN Model Structure Final Generator Structure A MLP with 2 hidden layers of hidde

Zhenbang Feng 2 Nov 05, 2021
Justmagic - Use a function as a method with this mystic script, like in Nim

justmagic Use a function as a method with this mystic script, like in Nim. Just

witer33 8 Oct 08, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
Code accompanying the paper on "An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers" published at NeurIPS, 2021

Code for "An Empirical Investigation of Domian Generalization with Empirical Risk Minimizers" (NeurIPS 2021) Motivation and Introduction Domain Genera

Meta Research 15 Dec 27, 2022
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

Ximing Yang 4 Dec 14, 2021
商品推荐系统

商品top50推荐系统 问题建模 本项目的数据集给出了15万左右的用户以及12万左右的商品, 以及对应的经过脱敏处理的用户特征和经过预处理的商品特征,旨在为用户推荐50个其可能购买的商品。 推荐系统架构方案 本项目采用传统的召回+排序的方案。

107 Dec 29, 2022
“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Data Augmentation for Cross-Domain Named Entity Recognition Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio This repository

<a href=[email protected]"> 18 Sep 10, 2022
Pretraining on Dynamic Graph Neural Networks

Pretraining on Dynamic Graph Neural Networks Our article is PT-DGNN and the code is modified based on GPT-GNN Requirements python 3.6 Ubuntu 18.04.5 L

7 Dec 17, 2022
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving This is the source code for our paper Frequency Domain Image Tran

Mu Cai 52 Dec 23, 2022
RL agent to play μRTS with Stable-Baselines3

Gym-μRTS with Stable-Baselines3/PyTorch This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS usin

Oleksii Kachaiev 24 Nov 11, 2022