Created by: sriniiyer
Description: Currently, multiple examples are packed into a single sequence during training/fine-tuning and attention can happen across sequences. This diff adds eos mode, which uses a single example per sequence. This can be useful for fine-tuning/debugging.
Test Plan: Code path is executed via --break-mode eos_pad_8 and code has been tested earlier in a private branch.