Unify model-parallel vs non model-parallel codepaths
Created by: suchenzang
As https://github.com/facebookresearch/metaseq/issues/383 flagged, right now we have two separate codepaths for model-parallel vs non model-parallel code (aka https://github.com/facebookresearch/metaseq/blob/main/metaseq/model_parallel/models/transformer_lm.py vs https://github.com/facebookresearch/metaseq/blob/main/metaseq/models/transformer_lm.py as one example).
This should all be unified.