Intro

A policy, π, is an agent’s strategy which determines which actions to take in any given state. It acts as a mapping from environment state to action and can be deterministic ( $π (s) = a$ ) or stochastic ( $π (a ∣ s)$ )

Quote

In reinforcement learning (RL), a policy is a strategy or rule that defines the agent’s behavior. It maps states (the agent’s observations) to actions or probabilities of actions. To those familiar with supervised learning, you can think of a policy as a type of model. In supervised learning, you train a model to map inputs to outputs based on labeled data. In RL, think of a policy as a model that tells the agent what action to take in order to maximize long-term cumulative reward

Types of Policies

Types of Policies

Deterministic Policy

A deterministic policy always picks the same action for a given state:

π (s) = a

Example:

In a game: always move right when at position 2

Stochastic Policy

A stochastic policy assigns probabilities to actions:

π (a ∣ s)

Example:

70% move right
30% move left

🧗‍♂️Random Restart

Explorer

Recent Notes

Qix (Software Build System)

RobotX Software

UCRT (Unmanned Collaborative Research Testbed)

Policy

Intro

Types of Policies

Deterministic Policy

Stochastic Policy

Graph View

Table of Contents

Backlinks