Created by: zdevito
The worker in the dataloader tells its ID to the training process so that it can keep a count of how many sentences each worker has processed and which worker is next. However, worker IDs have to be rotated using worker_offset on restart. This offset was not added to the ID in this case.
TODO: make a test with a repro, which will require these steps
- Train normally, and checkpoint at a step that is not divisible by num_workers
- checkpoint
- Resume and train for a number of steps that is not divisible by num_workers
- checkpoint
- Resume (old code will resume with the wrong worker)