Created by: patrickvonplaten
If we don't transfer the "decoder.version"
to the singleton checkpoint, a very sneaky bug happens which was found by @thomasw21 as part of this PR:
If the decoder.version
param is not present in the state_dict it follows that upon loading the single-ton checkpoint the loaded layer_norm is set to None
So it's absolutely crucial that we include this variable.
I will update all of the converted HF checkpoints here later today and then I think we can be sure that OPT works correctly :partying_face:
Patch Description Describe your changes
Testing steps Describe how you tested your changes