policy-gradient

Policy Gradient Methods: REINFORCE, Actor-Critic, A3C, and A2C

May 23, 2026 0 0

The previous post derived DQN, DDQN, and Dueling DQN. These value-based methods learn Q-function and follow a policy that maximizes Q-function. This approach works well for discrete action spaces but cannot be applied when actions are continuous, since the maximization step is no longer proces...