Simplify inheritance layers (!639) · Merge requests · Administrator / metaseq

Merged Administrator requested to merge Simplify-inheritance-layers into main Feb 06, 2023

Created by: bashnick

Patch Description PR removes unnecessary inheritance layers and flattens the class structure for better interpretability and transparency. Short summary:

removed TransformerDecoder -> ModelParallelTransformerDecoder
removed LanguageModel -> BaseModel
removed TransformerDecoderLayer -> ModelParallelTransformerDecoderLayer
removed MultiheadAttention -> ModelParallelMultiheadAttention
removed arch transformer_lm -> transformer_lm_megatron
updated test gpu_tests/test_hf_compatibility.py to work with model_parallel

Testing steps

tested model run: python -m PROJECT_NAME.projects.MODEL_NAME.sweep_baseline -g 4 -n 1 --rsc --model-size 8m --tokenizer rsc --prefix NB000 --local --data /checkpoint/TEAM_NAME/datasets/consolidated/v4.0
tested evaluations: FSD=/checkpoint/TEAM_NAME/datasets/few_shot_data python PROJECT_NAME/scripts/eval/schedule_jobs_few_shot_opt_evaluation.py -t copa cb flan_cb --model-name punitkoura_125m --model-path /checkpoint/TEAM_NAME/checkpoints/punitkoura/small_test_run/1000/checkpoint_1000.pt --model-template gptz_sharded_config --nshot 0 -o ~/MODEL_NAME/fix_scoring_001 --slurm-partition learn --combine-tasks --max-ingestible-tokens 4000
tested continued run from a checkpoint