The previous post derived DQN, DDQN, and Dueling DQN. These value-based methods learn Q-function and follow a policy that maximizes Q-function. This approach works well for discrete action spaces but cannot be applied when actions are continuous, since the maximization step is no longer proces...
A double pendulum consists of two pendulums attached to each other, and is a classic physical system that exhibits complex and chaotic motion. The balancing problem of double pendulum using only a single motor on the first joint is a well-known benchmark in control theory and robotics. This po...
MuJoCo (Multi-Joint dynamics with Contact) is a physics engine mainly developed by Emo Todorov and maintained by Google DeepMind. It is widely used in robotics, reinforcement learning research, and biomechanics. This post covers installation on macOS, importing model, and running simulations i...