Created by: stephenroller
Patch Description While this is simply fixing code we plan to deprecate anyway, I think we need this.
Some of the code paths (namely the generation path for evals) uses this BPE config to instantiate tokenizers. Unfortunately, since this field is missing from the dataclass, it doesn't get populated so when we load up models over there, we can't tell that the tokenizer should be instantiated, and we end up with this hardcoded gpt2 tokenizer path.
This patch fixes that by allowing the tokenizer to load in this BPE class.
Testing steps Evaluating a model trained with a different tokenizer.