An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
A simple model based API maker written in Python and based on Django and Django REST Framework

Fast DRF Fast DRF is a small library for making API faster with Django and Django REST Framework. It's easy and configurable. Full Documentation here

Mohammad Ashraful Islam 18 Oct 05, 2022
Social auth made simple

Python Social Auth Python Social Auth is an easy-to-setup social authentication/registration mechanism with support for several frameworks and auth pr

Matías Aguirre 2.8k Dec 24, 2022
Foundation Auth Proxy is an abstraction on Foundations' authentication layer and is used to authenticate requests to Atlas's REST API.

foundations-auth-proxy Setup By default the server runs on http://0.0.0.0:5558. This can be changed via the arguments. Arguments: '-H' or '--host': ho

Dessa - Open Source 2 Jul 03, 2020
Python library for generating a Mastercard API compliant OAuth signature.

oauth1-signer-python Table of Contents Overview Compatibility References Usage Prerequisites Adding the Library to Your Project Importing the Code Loa

23 Aug 01, 2022
Auth for use with FastAPI

FastAPI Auth Pluggable auth for use with FastAPI Supports OAuth2 Password Flow Uses JWT access and refresh tokens 100% mypy and test coverage Supports

David Montague 95 Jan 02, 2023
The ultimate Python library in building OAuth, OpenID Connect clients and servers. JWS,JWE,JWK,JWA,JWT included.

Authlib The ultimate Python library in building OAuth and OpenID Connect servers. JWS, JWK, JWA, JWT are included. Authlib is compatible with Python2.

Hsiaoming Yang 3.4k Jan 04, 2023
Graphical Password Authentication System.

Graphical Password Authentication System. This is used to increase the protection/security of a website. Our system is divided into further 4 layers of protection. Each layer is totally different and

Hassan Shahzad 12 Dec 16, 2022
Django Rest Framework App wih JWT Authentication and other DRF stuff

Django Queries App with JWT authentication, Class Based Views, Serializers, Swagger UI, CI/CD and other cool DRF stuff API Documentaion /swagger - Swa

Rafael Salimov 4 Jan 29, 2022
API with high performance to create a simple blog and Auth using OAuth2 ⛏

DogeAPI API with high performance built with FastAPI & SQLAlchemy, help to improve connection with your Backend Side to create a simple blog and Cruds

Yasser Tahiri 111 Jan 05, 2023
Toolkit for Pyramid, a Pylons Project, to add Authentication and Authorization using Velruse (OAuth) and/or a local database, CSRF, ReCaptcha, Sessions, Flash messages and I18N

Apex Authentication, Form Library, I18N/L10N, Flash Message Template (not associated with Pyramid, a Pylons project) Uses alchemy Authentication Authe

95 Nov 28, 2022
A Python tool to generate and refresh Amazon access tokens.

amazon_auth A Python tool to generate and refresh Amazon access tokens. Description This tool generates and outputs Amazon access and refresh tokens f

15 Nov 21, 2022
Library - Recent and favorite documents

Thingy Thingy is used to quickly access recent and favorite documents. It's an XApp so it can work in any distribution and many desktop environments (

Linux Mint 23 Sep 11, 2022
Corsair_scan is a security tool to test Cross-Origin Resource Sharing (CORS).

Welcome to Corsair_scan Corsair_scan is a security tool to test Cross-Origin Resource Sharing (CORS) misconfigurations. CORS is a mechanism that allow

Santander Security Research 116 Nov 09, 2022
Social auth made simple

Python Social Auth Python Social Auth is an easy-to-setup social authentication/registration mechanism with support for several frameworks and auth pr

Matías Aguirre 2.8k Dec 24, 2022
OAuth2 goodies for the Djangonauts!

Django OAuth Toolkit OAuth2 goodies for the Djangonauts! If you are facing one or more of the following: Your Django app exposes a web API you want to

Jazzband 2.7k Dec 31, 2022
row level security for FastAPI framework

Row Level Permissions for FastAPI While trying out the excellent FastApi framework there was one peace missing for me: an easy, declarative way to def

Holger Frey 315 Dec 25, 2022
This program automatically logs you into a Zoom session at your alloted time

This program automatically logs you into a Zoom session at your alloted time. Optionally you can choose to have end the session at your allotted time.

9 Sep 19, 2022
Simple extension that provides Basic, Digest and Token HTTP authentication for Flask routes

Flask-HTTPAuth Simple extension that provides Basic and Digest HTTP authentication for Flask routes. Installation The easiest way to install this is t

Miguel Grinberg 1.1k Jan 05, 2023
Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.

Welcome to django-allauth! Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (soc

Raymond Penners 7.7k Jan 01, 2023
User-related REST API based on the awesome Django REST Framework

Django REST Registration User registration REST API, based on Django REST Framework. Documentation Full documentation for the project is available at

Andrzej Pragacz 399 Jan 03, 2023