Created by: KUNAL1612
Patch Description
- If checkpoint is corrupt (missing shards), load last good checkpoint in checkpoint directory
- If user passes custom checkpoint through restore_file, make sure checkpoint is not corrupt, else load last good checkpoint
Testing steps Describe how you tested your changes
- Manually removed shards from auto-generated last checkpoints
- Passed non existent / corrupted checkpoints, model automatically loaded the last good checkpoint