# The Markov Decision Process (MDP)

**- Overview**

A Markov Decision Process (MDP) is a mathematical framework that models decision-making for dynamic systems. It's used when the outcomes are either random or controlled by a decision maker.

MDPs are discrete-time stochastic control processes. They model decision making in discrete, stochastic, sequential environments. The model is based on a decision maker, or agent, who inhabits an environment that changes state randomly in response to the agent's actions.

MDPs consist of four essential elements: States, Model, Actions, Rewards.

The agent's goal is to learn a policy that dictates the action to be taken in each state to maximize cumulative rewards.

MDPs can address most reinforcement learning (RL) problems. Some real-world examples of MDPs include: Harvesting, Agriculture, Water resources.

The Bellman equation is a fundamental equation in AI that is used to define the optimal value function for a given MDP. The equation is named after Richard Bellman, who first proposed it in the 1950s.

**[More to come ...]**