Created by: stephenroller
Patch Description THIS DEPENDS ON METASEQ-INTERNAL CHANGES.
Right now our checkpoint files have this format:
checkpoint_7_5500.pt
where 7
is the "epoch" number (for our weird definition of epoch). This is rather annoying when you're searching for a particular checkpoint, as I know I want update 5500, but I have no idea what the epoch number should be. This is made worse with Blob store, which can't handle wildcards in the middle of strings.
This patch changes the format to:
checkpoint_5500.pt
, stripping the epoch number.
Testing steps My latest checkpoints trained with this!