Skip to content

Latest commit

 

History

History

recipes

Recipes

This directory contains recipes for iterative preference optimization.

To run a recipe, use the run.sh script present in each subdirectory. For example:

# Options
export MAX_ITER=14
export LOG_ROOT="./log"
export PROJECT_NAME="alignment/iterative"
export EXPERIMENT_NAME="ultrafeedback-self_instruct-pair_rm"
export SELF_INSTRUCT_STAGE_ADDITIONAL_ARGS=""
export GENERATE_RESPONSES_STAGE_ADDITIONAL_ARGS=""
export PAIR_RM_STAGE_ADDITIONAL_ARGS=""
export DPO_STAGE_ADDITIONAL_ARGS="runner.training_args.beta=0.1" # Optional, overrides default config

# Execute
bash configs/experiments/recipes/ultrafeedback-self_instruct-pair_rm/run.sh

For debugging, use the debug.sh script located in each subdirectory. Optionally, you can run the debug.sh script in the current directory to execute all the debug.sh scripts in the subdirectories.