An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
Object Moderation Layer

django-oml Welcome to the documentation for django-oml! OML means Object Moderation Layer, the idea is to have a mixin model that allows you to modera

Angel Velásquez 12 Aug 22, 2019
Connect-4-AI - AI that plays Connect-4 using the minimax algorithm

Connect-4-AI Brief overview I coded up the Connect-4 (or four-in-a-row) game in

Favour Okeke 1 Feb 15, 2022
examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles

examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles ( Examiner , Supervisor , Student )

Ameer Nasser 4 Oct 28, 2021
This project is an open-source project which I made due to sharing my experience around the Python programming language.

django-tutorial This project is an open-source project which I made due to sharing my experience around the Django framework. What is Django? Django i

MohammadMasoumi 6 May 12, 2022
A Python inplementation for OAuth2

OAuth2-Python Discord Inplementation for OAuth2 login systems. This is a simple Python 'app' made to inplement in your programs that require (shitty)

Prifixy 0 Jan 06, 2022
Ready-to-use and customizable users management for FastAPI

FastAPI Users Ready-to-use and customizable users management for FastAPI Documentation: https://frankie567.github.io/fastapi-users/ Source Code: https

François Voron 2.4k Jan 04, 2023
A simple model based API maker written in Python and based on Django and Django REST Framework

Fast DRF Fast DRF is a small library for making API faster with Django and Django REST Framework. It's easy and configurable. Full Documentation here

Mohammad Ashraful Islam 18 Oct 05, 2022
Beihang University Network Authentication Login

北航自动网络认证使用说明 主文件 gw_buaa.py # @file gw_buaa.py # @author Dong # @date 2022-01-25 # @email windcicada 0 Jul 22, 2022

Flask user session management.

Flask-Login Flask-Login provides user session management for Flask. It handles the common tasks of logging in, logging out, and remembering your users

Max Countryman 3.2k Dec 28, 2022
Get inside your stronghold and make all your Django views default login_required

Stronghold Get inside your stronghold and make all your Django views default login_required Stronghold is a very small and easy to use django app that

Mike Grouchy 384 Nov 23, 2022
Python library for generating a Mastercard API compliant OAuth signature.

oauth1-signer-python Table of Contents Overview Compatibility References Usage Prerequisites Adding the Library to Your Project Importing the Code Loa

23 Aug 01, 2022
An extension of django rest framework, providing a configurable password reset strategy

Django Rest Password Reset This python package provides a simple password reset strategy for django rest framework, where users can request password r

Anexia 363 Dec 24, 2022
Django Rest Framework App wih JWT Authentication and other DRF stuff

Django Queries App with JWT authentication, Class Based Views, Serializers, Swagger UI, CI/CD and other cool DRF stuff API Documentaion /swagger - Swa

Rafael Salimov 4 Jan 29, 2022
Django Authetication with Twitch.

Django Twitch Auth Dependencies Install requests if not installed pip install requests Installation Install using pip pip install django_twitch_auth A

Leandro Lopes Bueno 1 Jan 02, 2022
row level security for FastAPI framework

Row Level Permissions for FastAPI While trying out the excellent FastApi framework there was one peace missing for me: an easy, declarative way to def

Holger Frey 315 Dec 25, 2022
Provide OAuth2 access to your app

django-oml Welcome to the documentation for django-oml! OML means Object Moderation Layer, the idea is to have a mixin model that allows you to modera

Caffeinehit 334 Jul 27, 2022
Out-of-the-box support register, sign in, email verification and password recovery workflows for websites based on Django and MongoDB

Using djmongoauth What is it? djmongoauth provides out-of-the-box support for basic user management and additional operations including user registrat

hao 3 Oct 21, 2021
python-social-auth and oauth2 support for django-rest-framework

Django REST Framework Social OAuth2 This module provides OAuth2 social authentication support for applications in Django REST Framework. The aim of th

1k Dec 22, 2022
Per object permissions for Django

django-guardian django-guardian is an implementation of per object permissions [1] on top of Django's authorization backend Documentation Online docum

3.3k Jan 01, 2023
Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

Intility 220 Jan 05, 2023