Intro
A policy, π, is an agent’s strategy which determines which actions to take in any given state. It acts as a mapping from environment state to action and can be deterministic () or stochastic ()
Quote
![]()
In reinforcement learning (RL), a policy is a strategy or rule that defines the agent’s behavior. It maps states (the agent’s observations) to actions or probabilities of actions. To those familiar with supervised learning, you can think of a policy as a type of model. In supervised learning, you train a model to map inputs to outputs based on labeled data. In RL, think of a policy as a model that tells the agent what action to take in order to maximize long-term cumulative reward
Types of Policies
Deterministic Policy
A deterministic policy always picks the same action for a given state:
Example:
- In a game: always move right when at position 2
Stochastic Policy
A stochastic policy assigns probabilities to actions:
Example:
- 70% move right
- 30% move left



