Created by: patrickvonplaten
If we don't transfer the "decoder.version"
to the singleton checkpoint, a very sneaky bug happens which was found by @thomasw21 as part of this PR: https://github.com/huggingface/transformers/pull/17785
If the decoder.version
param is not present in the state_dict it follows that upon loading the single-ton checkpoint the loaded layer_norm is set to None
here: https://github.com/facebookresearch/metaseq/blob/e0c4f6b0e4c523906ad8d561f727e3f2ac3a8e73/metaseq/models/transformer.py#L932
So it's absolutely crucial that we include this variable.
I will update all of the converted HF checkpoints here later today and then I think we can be sure that OPT works correctly :partying_face: https://huggingface.co/models?other=opt_metasq
Patch Description Describe your changes
Testing steps Describe how you tested your changes