Created by: andrewPoulton
Attempts to address #308, altering tokenizer path to a single universal hf-format file.
Since a bunch of user code would inevitably have the old merges/vocab filepaths baked in, I included a script to unify the old format to the new, and bring up an error (with the mitigation steps) if the new format is not used.
Note - this does not address hardcoded args for the API, as the task is set to language_modelling
there.