Created by: lilisierrayu
Patch Description Fix model initialization for some of the OPT models.
Issue: Models up to 13B get loaded but run into an error about a half/float mismatch, while 30B model runs fine (see discussion: https://fb.workplace.com/groups/gogogptzusers/permalink/762243555001669/).
Debugging:
After loading model from {azure_dir}/1.3B/consolidated_mp_2/consolidated.pt
, and print out [p.dtype for p in model.parameters()]
, it shows a mix of torch.float16 and torch.float32.
Found that cfg.model.tensor_parallel_init_model_on_gpu = False
in 1.3B model (as "True" in 30B model), so the model is not properly initialized and it fails silently due to dtype auto cast in model.load_state_dict.
Testing steps With the fix, able to launch internactive_cli.py and interactive_hosted.py with the following model paths: f"--path {azure_dir}/2.7B/consolidated_mp_1/consolidated.pt", f"--path {azure_dir}/30B/consolidated_mp_4/reshard.pt", f"--path {azure_dir}/30B/consolidated_mp_2/consolidated.pt", f"--path {azure_dir}/1.3B/consolidated_mp_2/consolidated.pt",