We adopted this repo from DeepMind Melting Pot for the paper Quantifying Agent Interaction in Multi-agent Reinforcement
Learning for Cost-efficient Generalization. All additional implementations are located in ./MARL/
. We also created some customized configurations located in ./meltingpot/python/configs/substrates
.
SP_train_origin.py
from Deep Mind sample self-play training code
SP_train.py
for custom self-play training
PP_train.py
for custom population-play training
SP_eval_train.py
for custom self-play training (1 seed) use by tournament evaluation
index_evaluate.py
for index evaluation from checkpoints obtained by SP_train_origin.py
tournament.py
for generating tournament heat map
correlation.ipynb
for calculating correlation between index and performance
bars.ipynb
for visualizatin tournament bar plot
index.ipynb
for visualizing index
index_seed.ipynb
for visualizing the effect of LoI approximation on resource allocation performance
tournament.ipynb
for visualizing tournament heat map
learning_curves_SP.ipynb
for custom self-play training curve
learning_curves_SP_eval.ipynb
for custom self-play (1 seed) training curve
learning_curves_PP.ipynb
for custom population-play training curve
learning_curves_tune.ipynb
for Deep Mind sample training curve
resource_allocation.ipynb
for visualizing the resource allocation comparison
{E}{S}{C}
-
{E}
for environmentc
- chickenpc
- pure coordinationpd
- prisoners dilemmash
- stag hunt
-
{C}
for configuration sizes
- smallm
- mediuml
- largeo
- obstacle
-
{S}
for seed number
SP_checkpoints
for custom self-play checkpoints
SP_eval_checkpoints
for custom self-play (1 seed) checkpoints
PP_checkpoints
for custom population-play checkpoints
~/ray_results/PPO/
for Deep Mind demo self-play checkpoints
File structure
{METHOD}_checkpoints
→{E}{S}{C}
→seed_{S}
→gen_{GEN}
SP_outputs
for custom self-play output
SP_eval_outputs
for custom self-play (1 seed) output
PP_outputs
for custom population-play output
File name
{E}{S}{C}.txt
SP_logs
for custom self-play output
SP_eval_logs
for custom self-play (1 seed) output
PP_logs
for custom population-play output
File name
{E}{S}{C}.npz
Numpy file structure
data['timesteps']
- (# seed, # step)
data['policy_reward_min']
- (# seed, agent id, # step)
data['policy_reward_mean']
- (# seed, agent id, # step)
data['policy_reward_max']
- (# seed, agent id, # step)
plots_bar
for tournament bar plots and resource allocation
plots_index
for index plots
plots_seed
for the effect of LoI approximation on resource allocation performance
plots_tournament
for tournament heat map plots
plots_SP
for custom self-play training curve
plots_SP_eval
for custom self-play (1 seed) training curve
plots_PP
for custom population-play training curve
plots_tune
for Deep Mind sample training curve
index_data
for index .npz
files
tournament_data
for tournament .npz
files
- Create folders for logs
{METHOD}_logs
, outputs{METHOD}_outputs/{E}{S}{C}.txt
, and checkpoints{METHOD}_checkpoints/{E}{S}{C}
- Change
substrate_name
string name of functionget_config()
- Change
save_path
,checkpoints_path
,log_path
in functionmain()
- Change
num_gens
andseeds
in functionmain()
- Change
continuous_training
toFalse
- Clean output
.txt
file to the nearest separate line###
- Delete excessive generation folders
gen_{GEN}
in checkpoints folder to make sure the latest generation are the same across all different seeds - Change
substrate_name
string name of functionget_config()
- Change
save_path
,checkpoints_path
,log_path
in functionmain()
- Change
num_gens
andseeds
in functionmain()
- Change
continuous_training
toTrue
- Change
starting_gen
to the latest generation number in checkpoints folder
- Create
tmux
session
tmux new -s meltingpot
cd meltingpot
- Run training code
python {METHOD}_train.py