Benchmark data (i.e., DeepMind Control Suite and MuJoCo) for RL.
All baseline algorithms are running based on the code repository from: ① Spinning Up repository / ② Fujimoto TD3 repository / ③ QingLi Implementation.
Baseline algorithms are listed as below:
- Deep Deterministic Policy Gradients (DDPG)
- Proximal Policy Optimization (PPO)
- Soft Actor-Critic (SAC)
- Twin Delayed Deep Deterministic Policy Gradients (TD3)
# eg. Notice: `-l` denotes labels, and `-s` represents smoothing value.
python spinupUtils/plot.py \
MuJoCo-3M/SpinningUp/DDPG/DDPG-Hopper-v2 \
MuJoCo-3M/SpinningUp/PPO/PPO-Hopper-v2 \
MuJoCo-3M/SpinningUp/TD3/TD3-Hopper-v2 \
MuJoCo-3M/SpinningUp/SAC/SAC-Hopper-v2 \
--env Hopper-v2 \
-l DDPG PPO TD3 SAC -s 10Including Ant-v2, HalfCheetah-v2, Hopper-v2, Humanoid-v2, Swimmer-v2, Walker2d-v2.
- Code of baseline algorithms is from Spinning Up repository, the agents are running for 3 million time steps.
- Code of baseline algorithms is from QingLi Implementation, the agents are running for 3 million time steps.
Including Ant-v2, HalfCheetah-v2, Hopper-v2, Humanoid-v2, Swimmer-v2, Walker2d-v2.
- Code of baseline algorithms is from Fujimoto TD3 repository, the agents are running for 1 million time steps by default.
Including acrobot-swingup, ball_in_cup-catch, cartpole-swingup, cartpole-swingup_sparse, cartpole-three_poles, cartpole-two_poles, cheetah-run, finger-spin, finger-spin, finger-turn_easy, finger-turn_hard, fish-swim, hopper-hop, hopper-stand, humanoid-run, humanoid-run_pure_state, humanoid-stand, pendulum-swingup, point_mass-easy, point_mass-hard, quadruped-fetch, quadruped-run, quadruped-walk, swimmer-swimmer6, swimmer-swimmer15, walker-run.
- Code of baseline algorithms is from Spinning Up repository, the agents are running for 3 million time steps.
@misc{QingLi2021continuousbenchmark,
author = {Qing Li},
title = {Continuous Control Benchmark of DeepMind Control Suite and MuJoCo},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LQNew/Continuous_Control_Benchmark}}
}



