Created by: ruanslv
Patch Description Changes from https://github.com/facebookresearch/metaseq/pull/300 required to make trainers work with fairseq_v3 Megatron branch.
Testing steps
$ python -m sweep_baseline -g 8 -n 1 -t 1 --azure --model-size 125m --prefix fv3_test --local