Created by: urielsinger
in non FSDP mode
when not in FSDP, the model wasn't converted to bf16 although bf16 was set to True.
optimizer._multiply_factor reset to 1.0 each step
In MemoryEfficientFP16Optimizer.zero_grad (that is applied every training step), the model sets _multiply_factor back to 1.0: https://github.com/facebookresearch/metaseq/blob/bbcedfebb4c35f71cdda1f1a358491f3996a9fc3/metaseq/optim/fp16_optimizer.py#L452 A similar thing was applied also in the normal FP16Optimizer.