Intro
Proximal Policy Optimization
PPO defines the advantage by asking
A_t = How much better than average was the action that we took?
Proximal Policy Optimization
PPO defines the advantage by asking
A_t = How much better than average was the action that we took?