reinforcement-learning

Policy Gradient Methods: REINFORCE, Actor-Critic, A3C, and A2C

May 23, 2026 0 0

The previous post derived DQN, DDQN, and Dueling DQN. These value-based methods learn Q-function and follow a policy that maximizes Q-function. This approach works well for discrete action spaces but cannot be applied when actions are continuous, since the maximization step is no longer proces...

Balancing a Double Pendulum with DQN and MuJoCo

March 31, 2026 0 0

A double pendulum consists of two pendulums attached to each other, and is a classic physical system that exhibits complex and chaotic motion. The balancing problem of double pendulum using only a single motor on the first joint is a well-known benchmark in control theory and robotics. This po...