Intro

Proximal Policy Optimization

PPO defines the advantage by asking

A_t = How much better than average was the action that we took?