Created by: Xirider
Added E2E test for training resumption and azure checkpoint storing logic. Issue: https://github.com/facebookresearch/metaseq/issues/351 and https://github.com/facebookresearch/metaseq/issues/268
- Training runs for 20 steps -> check for creation of the correct checkpoints and the correct uploads to azure blob and train loss
- Training resumes for 15 steps -> check for correct "download" of last checkpoints from azure blob, load these checkpoints and have correct final loss
Azure blob download and upload is mocked out.
Note that newly spawned subprocesses do not keep mocked objects, so I had to instead pass functions that create the mocks inside the subprocesses.