Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Official implementation of ACC, described in the paper "Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning". The source code is based on the pytorch implementation of TQC, which again is based on TD3. We thank the authors for making their source code publicly available.
Requirements
Install MuJoCo
-
Download and install MuJoCo 1.50 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (
~/.mujoco/mjpro150
). -
Copy your MuJoCo license key (mjkey.txt) to
~/.mujoco/mjkey.txt
:
Install
We recommend to use an anaconda environment. In our experiments we used python 3.7
and the following dependencies
pip install gym==0.17.2 mujoco-py==1.50.1.68 numpy==1.19.1 torch==1.6.0 torchvision==0.7.0
Running ACC
You can run ACC for TQC on one of the gym continuous control environments by calling
python main.py --env "HalfCheetah-v3" --max_timesteps 5000000 --seed 0
To run the data efficient variant with 4 critic update steps per environment step you can call
python main.py --env "HalfCheetah-v3" --max_timesteps 1000000 --num_critic_updates 4 --seed 0
An example script that runs the experiments for 10 seeds and all environments is in run_experiment.sh
and run_experiment_data_efficient.sh
.
You can speed up the experiments by using fewer networks in the ensemble of TQC. This trades off a little bit of performance for a faster runtime (see the Appendix of the paper). The number of networks can be controlled with the flag --n_nets
. For example
python main.py --env "HalfCheetah-v3" --max_timesteps 5000000 --n_nets 2--seed 0