Unify `transformer_lm_megatron` and `transformer_lm` arch
Created by: suchenzang
In the process of unifying model-parallel codepaths (https://github.com/facebookresearch/metaseq/issues/389), we should remove the two different arch definitions.
Created by: suchenzang
In the process of unifying model-parallel codepaths (https://github.com/facebookresearch/metaseq/issues/389), we should remove the two different arch definitions.