Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
ddpg.py		ddpg.py
ddpg_step.py		ddpg_step.py
main.py		main.py
test.py		test.py

README.md

DDPG 算法

算法的更新核心代码在 ddpg_step.py 中。

1.训练

训练 DDPG 的主要代码在 main.py, 使用 click 解析命令行参数, 因此也可以使用命令行配置参数。执行 python -m PolicyGradient.DDPG.main --help 可以查看所有参数:

Options:
  --env_id TEXT              Environment Id
  --render BOOLEAN           Render environment or not
  --num_process INTEGER      Number of process to run environment
  --lr_p FLOAT               Learning rate for Policy Net
  --lr_v FLOAT               Learning rate for Value Net
  --gamma FLOAT              Discount factor
  --polyak FLOAT             Interpolation factor in polyak averaging for
                             target networks
  --explore_size INTEGER     Explore steps before execute deterministic policy
  --memory_size INTEGER      Size of replay memory
  --step_per_iter INTEGER    Number of steps of interaction in each iteration
  --batch_size INTEGER       Batch size
  --min_update_step INTEGER  Minimum interacts for updating
  --update_step INTEGER      Steps between updating policy and critic
  --max_iter INTEGER         Maximum iterations to run
  --eval_iter INTEGER        Iterations to evaluate the model
  --save_iter INTEGER        Iterations to save the model
  --action_noise FLOAT       Std for noise of action
  --model_path TEXT          Directory to store model
  --log_path TEXT            Directory to save logs
  --seed INTEGER             Seed for reproducing
  --help                     Show this message and exit.

这里的 action_noise 是为了增加策略的 exploration，使用正态分布模拟。

2.测试

训练好的模型保存在trained_models下, 执行 test.py 加载对应的模型以测试模型性能，其命令行参数与 main.py 基本一致。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDPG

DDPG

README.md

DDPG 算法

1.训练

2.测试

Files

DDPG

Directory actions

More options

Directory actions

More options

Latest commit

History

DDPG

Folders and files

parent directory

README.md

DDPG 算法

1.训练

2.测试