python

Policy Gradient Methods: REINFORCE, Actor-Critic, A3C, and A2C

May 23, 2026 0 0

The previous post derived DQN, DDQN, and Dueling DQN. These value-based methods learn Q-function and follow a policy that maximizes Q-function. This approach works well for discrete action spaces but cannot be applied when actions are continuous, since the maximization step is no longer proces...

Balancing a Double Pendulum with DQN and MuJoCo

March 31, 2026 0 0

A double pendulum consists of two pendulums attached to each other, and is a classic physical system that exhibits complex and chaotic motion. The balancing problem of double pendulum using only a single motor on the first joint is a well-known benchmark in control theory and robotics. This po...

Getting Started with MuJoCo on macOS

March 4, 2026 0 0

MuJoCo (Multi-Joint dynamics with Contact) is a physics engine mainly developed by Emo Todorov and maintained by Google DeepMind. It is widely used in robotics, reinforcement learning research, and biomechanics. This post covers installation on macOS, importing model, and running simulations i...

Sangil Lee

python

Policy Gradient Methods: REINFORCE, Actor-Critic, A3C, and A2C

Balancing a Double Pendulum with DQN and MuJoCo

Getting Started with MuJoCo on macOS