Personal tools

The Markov Decision Process (MDP)

Markov Decision Process_011124A
[Markov Decision Process - Wikipedia]

- Overview

A Markov Decision Process (MDP) is a mathematical framework that models decision-making for dynamic systems. It's used when the outcomes are either random or controlled by a decision maker. 

MDPs are discrete-time stochastic control processes. They model decision making in discrete, stochastic, sequential environments. The model is based on a decision maker, or agent, who inhabits an environment that changes state randomly in response to the agent's actions. 

MDPs consist of four essential elements: States, Model, Actions, Rewards. 

The agent's goal is to learn a policy that dictates the action to be taken in each state to maximize cumulative rewards. 

MDPs can address most reinforcement learning (RL) problems. Some real-world examples of MDPs include: Harvesting, Agriculture, Water resources.

The Bellman equation is a fundamental equation in AI that is used to define the optimal value function for a given MDP. The equation is named after Richard Bellman, who first proposed it in the 1950s. 

[More to come ...]

Document Actions