BF16 bug fix: (!330) · Merge requests · Administrator / metaseq

Merged Administrator requested to merge bf16_bugs into main Sep 06, 2022

Created by: urielsinger

when not in FSDP, the model wasn't converted to bf16 although bf16 was set to True.

In MemoryEfficientFP16Optimizer.zero_grad (that is applied every training step), the model sets _multiply_factor back to 1.0: https://github.com/facebookresearch/metaseq/blob/bbcedfebb4c35f71cdda1f1a358491f3996a9fc3/metaseq/optim/fp16_optimizer.py#L452 A similar thing was applied also in the normal FP16Optimizer.