Created by: suchenzang
This is causing checkpoints loaded from blob store to barf:
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 155, in main
FileNotFoundError: [Errno 2] No such file or directory: 'https://<redacted>'
if verify_shards(cfg, dir=dir, checkpoint_name=checkpoint_name):
File "/shared/home/<redacted>/metaseq-internal/<redacted>/metaseq/checkpoint_utils.py", line 187, in verify_shards
cli_main()
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 618, in cli_main
main(cfg, **kwargs)
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 155, in main
if verify_shards(cfg, dir=dir, checkpoint_name=checkpoint_name):
File "/shared/home/<redacted>/metaseq-internal/<redacted>/metaseq/checkpoint_utils.py", line 187, in verify_shards
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
File "/shared/home/<redacted>/metaseq-internal/<redacted>/metaseq/checkpoint_utils.py", line 289, in load_checkpoint
main(cfg, **kwargs)
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 155, in main
cli_main()
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 618, in cli_main
for file in os.listdir(dir):
cli_main()
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 618, in cli_main
cli_main()
File "/shared/home/<redacted>/metaseq-internal/./<redacted>/metaseq_cli/train.py", line 618, in cli_main
if verify_shards(cfg, dir=dir, checkpoint_name=checkpoint_name):
File "/shared/home/<redacted>/metaseq-internal/<redacted>/metaseq/checkpoint_utils.py", line 187, in verify_shards
if verify_shards(cfg, dir=dir, checkpoint_name=checkpoint_name):
File "/shared/home/<redacted>/metaseq-internal/<redacted>/metaseq/checkpoint_utils.py", line 187, in verify_shards
FileNotFoundError: [Errno 2] No such file or directory: 'https://<redacted> '
The checkpoint corruption check needs to be re-worked to happen after download from blob-store is completed.