Intro

A policy, π, is an agent’s strategy which determines which actions to take in any given state. It acts as a mapping from environment state to action and can be deterministic () or stochastic ()

Quote

description

In reinforcement learning (RL), a policy is a strategy or rule that defines the agent’s behavior. It maps states (the agent’s observations) to actions or probabilities of actions. To those familiar with supervised learning, you can think of a policy as a type of model. In supervised learning, you train a model to map inputs to outputs based on labeled data. In RL, think of a policy as a model that tells the agent what action to take in order to maximize long-term cumulative reward

Types of Policies

description

Deterministic Policy

A deterministic policy always picks the same action for a given state:

Example:

  • In a game: always move right when at position 2

Stochastic Policy

A stochastic policy assigns probabilities to actions:

Example:

  • 70% move right
  • 30% move left