Merged
requested to merge github/fork/bashnick/Unify-model-parallel-vs-non-model-parallel-codepaths into main
Created by: bashnick
Patch Description Unified model-parallel vs non model-parallel codepaths: merged files from metaseq.model_parallel to metaseq.models. Solving issue #389 (closed)
Testing steps MP2 python -m PROJECT.sweep_baseline -g 4 -n 1 --model-size 8m --prefix test_01 --local --data /checkpoint/TEAM_NAME/datasets/consolidated/v4.0
MP1 python -m PROJECT.sweep_baseline -g 4 -n 1 --model-size 8m_mp1 --prefix test_01 --local --data /checkpoint/TEAM_NAME/datasets/consolidated/v4.0
Results MP2
Metric | Before | After | Comment |
---|---|---|---|
num_updates | 100 | 100 | --- |
loss | 14.406 | 14.406 | --- |
wps | 911142 | 909002 | <0.25% difference, noise |
MP=1
Metric | Before | After | Comment |
---|---|---|---|
num_updates | 100 | 100 | --- |
loss | 15.857 | 15.857 | --- |
After merging
Need to re-install metaseq: rm -rf metaseq copy metaseq from main branch cd ~/rsc/metaseq pip install --no-build-isolation -e .