keep_checkpoint option removes best performing model

With the `keep_checkpoint` option we can specify how many checkpoints should be kept. However, the checkpoints are just saved sequentially and never ordered. That means that if your best performing model is early on, it might get removed anyway.

https://github.com/OpenNMT/OpenNMT-py/blob/073428849c1d10dd4fae7f8fd92699cdc9f230a4/onmt/models/model_saver.py#L79-L83

As an alternative approach, I would suggest that if validation is done before each save step, that validation loss is also passed to the `save` method. `self.checkpoint_queue` could then contain tuples of `(loss, chkpt_name)` and after each append that queue gets sorted on `loss`. That way, only the worst performing models are removed.

Things to consider: `ModelSaver` should then know whether the metric is higher=better or lower=better, and a fallback needs to be in-place when no loss is passed.

	if self.keep_checkpoint > 0:
	if len(self.checkpoint_queue) == self.checkpoint_queue.maxlen:
	todel = self.checkpoint_queue.popleft()
	self._rm_checkpoint(todel)
	self.checkpoint_queue.append(chkpt_name)

	if (self.model_saver is not None
	and (save_checkpoint_steps != 0
	and step % save_checkpoint_steps == 0)):
	self.model_saver.save(step, moving_average=self.moving_average)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

keep_checkpoint option removes best performing model #1946

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

keep_checkpoint option removes best performing model #1946

Description

Activity

francoishernandez commented on Nov 24, 2020

BramVanroy commented on Nov 24, 2020

francoishernandez commented on Nov 24, 2020

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions