Created by: zdevito
I misunderstood how the DataLoader worked. I thought it just round-robin each of the workers in turn for the next batch. This is true until one of the workers runs out of batches. At that point it continues by asking other workers for batches until they all ran out. Previously I thought that it would just stop when the first worker ran out. This means that the code was accounting for sequences_consumed
on each worker inaccurately at the very end of an epoch when some workers have no data.
This is a small patch that fixes the bug and makes it more robust by passing the worker_id to the StreamingCountingIterator instead of having it try to mimic the logic of which worker the batch came from.