Created by: zdevito
Training runs using StreamingSrcTgtDataset were failing because they did not do the same token length caching as DocumentsToSequences.
StreamingSrcTgtDataset is really just another instances of StreamingTokenBlockDataset where the the blocks are split into a tuple (src, target). To avoid duplication this PR just adds support for this case directly to DocumentToSequences, and a test to verify this replicates the old behavior.