Added saving of best checkpoint #1858

sugeeth14 · 2020-08-31T20:14:42Z

With reference to the issue here #1856 . Added saving of best checkpoint based on perplexity.

francoishernandez · 2020-09-01T07:54:55Z

Not sure this is the best way to do it:

the model is actually saved twice, doubling the i/o
you only check for ppl, and probably not in the best place in the code

Can't we just have some flag in the EarlyStopping class that would prevent deleting the current best model when cleaning up?

sugeeth14 · 2020-09-01T08:07:14Z

It is inspired from fairseq where the best checkpoint is also saved alongside the original checkpoint thought it is a copy https://github.com/pytorch/fairseq/blob/983163494663e24b611f1ba8d5d47a3edc00e2e5/fairseq/checkpoint_utils.py#L60

When we use -keep_checkpoint since only the last n checkpoints are saved there is a possibility that the best is lost. So isn't it a good idea to have best checkpoint saved so that we can resume training with last checkpoints and also have the best one

I wanted to check for validation loss to save the best like fairseq but then again we should validate at that point. In case of fairseq validation and saving is done at the same time in all cases even when using--save-interval-updates so thought of sticking to ppl.
https://fairseq.readthedocs.io/en/latest/command_line_tools.html#Checkpointing

your thoughts ?

francoishernandez · 2020-09-01T08:22:23Z

My point is that you don't need to copy the model, you can just 'not remove it' when cleaning up to keep the last n. (And you can make a link to keep the info of which checkpoint is currently the best in your training folder.)
As for resuming training, not sure how to handle this anyway since you won't have the metric of the 'previous best' to compare to your new training metrics. Unless you embed it in the checkpoints, and if you do that you can also embed the flag that says which model is the best to prevent removing it.
The cleanest way is probably to have something similar: force validation every -save_checkpoint_steps in the case of early stopping. Either way, this code does not belong where it is now.

sugeeth14 · 2020-09-02T05:45:10Z

You think force validation is enough every -save_checkpoint_steps just in case of early stopping ?
as I understand I will change here https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/trainer.py#L275 so that it is validated every save_checkpoint_steps

1.In case of model_saver.py want to add looking up in a json file best_checkpoint.json having values

{
"best_checkpoint_step":  step,
"validation_ppl": ppl,
}

and not delete that checkpoint when removing the checkpoints and update it every time best checkpoint comes from (2) . This way the info is saved for later lookup on which checkpoint is the best in the directory. Would that be okay ?
Can you suggest where the new changes would be appropriate ?

francoishernandez · 2020-09-02T16:51:07Z

You think force validation is enough every -save_checkpoint_steps just in case of early stopping ?
as I understand I will change here https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/trainer.py#L275 so that it is validated every save_checkpoint_steps

Sure, you can check whether self.earlystopper is None, and if not you also have the save_checkpoint_steps value available.

Not sure about the json file, but it wouldn't hurt I presume. As for where, you can probably put this in trainer.py: retrieve a flag when self.earlystopper is called, and add a condition to this flag when model_saver is called a few lines down.

francoishernandez · 2020-09-03T07:45:26Z

@raghava14 please note I just merged #1835, hence the conflicts.

sugeeth14 · 2020-09-03T13:28:02Z

Hi I created the PR here #1859 hope it is okay please review.

sugeeth14 closed this Sep 3, 2020

sugeeth14 force-pushed the master branch from 176bafa to a5401ac Compare September 3, 2020 10:44

sugeeth14 mentioned this pull request Sep 3, 2020

Added saving of best checkpoint during early stopping #1859

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added saving of best checkpoint #1858

Added saving of best checkpoint #1858

sugeeth14 commented Aug 31, 2020 •

edited

Loading

francoishernandez commented Sep 1, 2020 •

edited

Loading

sugeeth14 commented Sep 1, 2020 •

edited

Loading

francoishernandez commented Sep 1, 2020

sugeeth14 commented Sep 2, 2020 •

edited

Loading

francoishernandez commented Sep 2, 2020

francoishernandez commented Sep 3, 2020

sugeeth14 commented Sep 3, 2020

Added saving of best checkpoint #1858

Added saving of best checkpoint #1858

Conversation

sugeeth14 commented Aug 31, 2020 • edited Loading

francoishernandez commented Sep 1, 2020 • edited Loading

sugeeth14 commented Sep 1, 2020 • edited Loading

francoishernandez commented Sep 1, 2020

sugeeth14 commented Sep 2, 2020 • edited Loading

francoishernandez commented Sep 2, 2020

francoishernandez commented Sep 3, 2020

sugeeth14 commented Sep 3, 2020

sugeeth14 commented Aug 31, 2020 •

edited

Loading

francoishernandez commented Sep 1, 2020 •

edited

Loading

sugeeth14 commented Sep 1, 2020 •

edited

Loading

sugeeth14 commented Sep 2, 2020 •

edited

Loading