Created by: suchenzang
Tested:
2022-11-09 12:41:47 | INFO | metaseq.checkpoint_utils | Preparing to save checkpoint for epoch 1 @ 10 updates
2022-11-09 12:41:47 | INFO | metaseq.trainer | Saving checkpoint to /scratch/slurm_tmpdir/164255/<redacted>/checkpoint_10-model_part-0-shard0.pt
2022-11-09 12:41:48 | INFO | metaseq.trainer | Finished saving checkpoint to /scratch/slurm_tmpdir/<redacted>/checkpoint_10-model_part-0-shard0.pt
2022-11-09 12:41:48 | INFO | metaseq.checkpoint_utils | Saved checkpoint /scratch/slurm_tmpdir/164255/<redacted>/checkpoint_10-model_part-0-shard0.pt (epoch 1 @ 10 updates) (writing took 1.2194619262591004 seconds)
Submitted batch job 164257
2022-11-09 12:41:48 | INFO | metaseq.cli.train | begin validation on "valid/thread_<redacted>_downsample" subset on rank 0
2022-11-09 12:41:48 | INFO | metaseq.tasks.streaming_language_modeling | setting shuffle buffer size to 10240