#460 and #479 require `fairseq_v3` branch of Megatron-LM

Created by: EIFY

🐛 Bug

#460 and #479 require fairseq_v3 branch of Megatron-LM

To Reproduce

Set up as instructed with Megatron-LM branch fairseq_v2
Edit metaseq/service/constants.py as necessary. In my case, follow https://github.com/facebookresearch/metaseq/issues/407#issuecomment-1293015551.
Run metaseq-api-local
See error

$ metaseq-api-local
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 0
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 2
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 3
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 6
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 7
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 5
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 1
2022-11-17 23:51:51 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 4
Traceback (most recent call last):
  File "/home/jason_chou/metaseq/metaseq/distributed/utils.py", line 176, in distributed_init
    from megatron.global_vars import (
ImportError: cannot import name '_GLOBAL_MEMORY_BUFFER' from 'megatron.global_vars' (/home/jason_chou/Megatron-LM/megatron/global_vars.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jason_chou/.conda/envs/user/bin/metaseq-api-local", line 8, in <module>
    sys.exit(cli_main())
  File "/home/jason_chou/metaseq/metaseq/cli/interactive_hosted.py", line 380, in cli_main
    distributed_utils.call_main(cfg, worker_main, namespace_args=args)
  File "/home/jason_chou/metaseq/metaseq/distributed/utils.py", line 283, in call_main
    return _spawn_helper(main, cfg, kwargs)
  File "/home/jason_chou/metaseq/metaseq/distributed/utils.py", line 261, in _spawn_helper
    retval = distributed_main(-1, main, cfg, kwargs)
  File "/home/jason_chou/metaseq/metaseq/distributed/utils.py", line 218, in distributed_main
    cfg.distributed_training.distributed_rank = distributed_init(cfg)
  File "/home/jason_chou/metaseq/metaseq/distributed/utils.py", line 181, in distributed_init
    raise ImportError(
ImportError: 

Please install megatron using the setup instructions!

If I roll back to the commit right before #479 (364d7315) then metaseq-api-local can run, but then actually requesting completion results in errors:

~/metaseq$ git checkout 364d7315dfe91046fb1b58450edeac67e7d83a10
M       metaseq/service/constants.py
Note: switching to '364d7315dfe91046fb1b58450edeac67e7d83a10'.
(...)

$ metaseq-api-local
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 0
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 7
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 5
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 6
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 2
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 3
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 4
2022-11-18 00:00:43 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 1
> initializing tensor model parallel with size 8
> initializing pipeline model parallel with size 1
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2719 and data parallel seed: 1
2022-11-18 00:00:47 | INFO | metaseq.hub_utils | loading model(s) from /home/jason_chou/redspot_home/66b/reshard.pt
2022-11-18 00:01:07 | INFO | metaseq.checkpoint_utils | Done reading from disk
2022-11-18 00:01:13 | INFO | metaseq.checkpoint_utils | Done loading state dict
2022-11-18 00:01:14 | INFO | metaseq.cli.interactive | loaded model 0
2022-11-18 00:01:14 | INFO | metaseq.cli.interactive | Worker engaged! 172.21.45.228:6010
 * Serving Flask app 'metaseq.cli.interactive_hosted' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2022-11-18 00:01:14 | INFO | werkzeug | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:6010
 * Running on http://172.21.45.228:6010
2022-11-18 00:01:14 | INFO | werkzeug | Press CTRL+C to quit

$ curl -k http://localhost:6010/completions   -H "Content-Type: application/json"   -H "Authorization: authentic"   -d '{
  "prompt": "Description: A chair, two beds\nItems mentioned: 1 chair, 2 beds.\nDescription: A carpet, four beds\nItems mentioned: 1 carpet, 4 beds.\nDescription: Outside chair leg broken unrepairable/trash left around entire home\nItems mentioned: 1 chair.\nDescription: 3 rugs - 2 kitchen rugs and the living room rug\nItems mentioned: 3 rugs.",
  "temperature": 1.0,
  "max_tokens": 32, "min_tokens": 4,
  "top_p": 0.9, "n": 1,
  "echo": false, "stop": "\n"
}'| jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2557  100  2093  100   464   2450    543 --:--:-- --:--:-- --:--:--  2990
{
  "error": {
    "code": null,
    "message": "module 'megatron.mpu' has no attribute 'LinearWithGradAccumulationAndAsyncCommunication'",
    "param": null,
    "stacktrace": [
      "  File \"/home/default_user/.conda/envs/user/lib/python3.10/site-packages/flask/app.py\", line 1523, in full_dispatch_request\n    rv = self.dispatch_request()\n",
      "  File \"/home/default_user/.conda/envs/user/lib/python3.10/site-packages/flask/app.py\", line 1509, in dispatch_request\n    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/cli/interactive_hosted.py\", line 348, in completions\n    raise generations\n",
      "  File \"/home/jason_chou/metaseq/metaseq/cli/interactive_hosted.py\", line 153, in batching_loop\n    generations = generator.generate(**request_object)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/hub_utils.py\", line 277, in generate\n    translations = self.task.inference_step(generator, self.models, batch)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/tasks/base_task.py\", line 426, in inference_step\n    return generator.generate(models, sample, prefix_tokens=prefix_tokens)\n",
      "  File \"/home/default_user/.conda/envs/user/lib/python3.10/site-packages/torch/autograd/grad_mode.py\", line 27, in decorate_context\n    return func(*args, **kwargs)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/sequence_generator.py\", line 88, in generate\n    return self._generate(sample, **kwargs)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/sequence_generator.py\", line 169, in _generate\n    model_out = self.model.decoder(\n",
      "  File \"/home/default_user/.conda/envs/user/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/models/transformer_decoder.py\", line 379, in forward\n    x = self.output_layer(x)\n",
      "  File \"/home/jason_chou/metaseq/metaseq/model_parallel/models/transformer.py\", line 65, in output_layer\n    x = mpu.LinearWithGradAccumulationAndAsyncCommunication.apply(\n"
    ],
    "type": "invalid_request_error"
  }
}

If I roll all the way back to the commit right before #460 (ce294a11), then things work:

~/metaseq$ git checkout ce294a115cecf02efb8bae2f26305728d7c05500
M       metaseq/service/constants.py
Previous HEAD position was 364d731 Launch separate sbatch job to copy checkpoints over from scratch to NFS (#494)
HEAD is now at ce294a1 Remove different dimension args (#462)

$ metaseq-api-local
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 0
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 5
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 3
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 1
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 4
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 6
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 2
2022-11-18 00:07:18 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 7
> initializing tensor model parallel with size 8
> initializing pipeline model parallel with size 1
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2719 and data parallel seed: 1
2022-11-18 00:07:22 | INFO | metaseq.hub_utils | loading model(s) from /home/jason_chou/redspot_home/66b/reshard.pt
2022-11-18 00:07:42 | INFO | metaseq.checkpoint_utils | Done reading from disk
2022-11-18 00:07:46 | INFO | metaseq.checkpoint_utils | Done loading state dict
2022-11-18 00:07:47 | INFO | metaseq.cli.interactive | loaded model 0
2022-11-18 00:07:48 | INFO | metaseq.cli.interactive | Worker engaged! 172.21.45.228:6010
 * Serving Flask app 'metaseq.cli.interactive_hosted' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2022-11-18 00:07:48 | INFO | werkzeug | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:6010
 * Running on http://172.21.45.228:6010
2022-11-18 00:07:48 | INFO | werkzeug | Press CTRL+C to quit

$ curl -k http://localhost:6010/completions   -H "Content-Type: application/json"   -H "Authorization: authentic"   -d '{
  "prompt": "Description: A chair, two beds\nItems mentioned: 1 chair, 2 beds.\nDescription: A carpet, four beds\nItems mentioned: 1 carpet, 4 beds.\nDescription: Outside chair leg broken unrepairable/trash left around entire home\nItems mentioned: 1 chair.\nDescription: 3 rugs - 2 kitchen rugs and the living room rug\nItems mentioned: 3 rugs.",
  "temperature": 1.0,
  "max_tokens": 32, "min_tokens": 4,
  "top_p": 0.9, "n": 1,
  "echo": false, "stop": "\n"
}'| jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1811  100  1347  100   464    382    131  0:00:03  0:00:03 --:--:--   514
{
  "choices": [
    {
      "logprobs": {
        "finish_reason": "length",
        "text_offset": [
          0,
          2,
          4,
          11,
          15,
          21,
          23,
          26,
          27,
          29,
          33,
          33,
          34,
          35,
          40,
          43,
          45,
          48,
          59,
          63,
          66,
          69,
          77,
          80,
          89,
          91,
          95,
          103,
          106,
          108,
          112,
          114
        ],
        "token_logprobs": [
          -2.8142499923706055,
          -7.563986301422119,
          -5.285521984100342,
          -2.192822217941284,
          -4.636736869812012,
          -1.7162295579910278,
          -0.0005318744806572795,
          -2.6370177268981934,
          -2.1895456314086914,
          -1.9937776327133179,
          -1.9050769805908203,
          -0.002048682188615203,
          -0.00029025712865404785,
          -1.0687462091445923,
          -1.5542668104171753,
          -3.009408950805664,
          -3.9404053688049316,
          -5.934344291687012,
          -1.175774335861206,
          -1.0492457151412964,
          -1.059978723526001,
          -4.073267459869385,
          -2.1524734497070312,
          -3.039748430252075,
          -2.659883499145508,
          -5.401388168334961,
          -2.7808990478515625,
          -0.6962568163871765,
          -0.23176079988479614,
          -0.10268733650445938,
          -2.26662015914917,
          -0.04845426231622696
        ],
        "tokens": [
          " (",
          "My",
          " family",
          " has",
          " three",
          " r",
          "ugs",
          ".",
          " I",
          " don",
          "�",
          "�",
          "t",
          " know",
          " if",
          " I",
          " am",
          " forgetting",
          " one",
          " or",
          " if",
          " someone",
          " is",
          " counting",
          " a",
          " dog",
          " blanket",
          " as",
          " a",
          " rug",
          ").",
          "\n"
        ],
        "top_logprobs": null
      },
      "text": " (My family has three rugs. I don’t know if I am forgetting one or if someone is counting a dog blanket as a rug).\n"
    }
  ],
  "created": 1668730119,
  "id": "0e60c361-808a-4028-a064-3b625a66d36e",
  "model": "/home/jason_chou/redspot_home/66b/",
  "object": "text_completion"
}

Alternatively, the current main also works with fairseq_v3 branch of Megatron-LM:

~/metaseq$ git checkout main
M       metaseq/service/constants.py
Previous HEAD position was ce294a1 Remove different dimension args (#462)
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
$ cd ../Megatron-LM/
~/Megatron-LM$ git checkout fairseq_v3
Switched to branch 'fairseq_v3'
Your branch is up to date with 'origin/fairseq_v3'.
$ metaseq-api-local
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 0
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 7
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 2
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 5
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 4
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 3
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 1
2022-11-18 00:14:15 | INFO | metaseq.distributed.utils | initialized host i-0bf8e5569aa4999be as rank 6
> initializing tensor model parallel with size 8
> initializing pipeline model parallel with size 1
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2719 and data parallel seed: 1
2022-11-18 00:14:19 | INFO | metaseq.hub_utils | loading model(s) from /home/jason_chou/redspot_home/66b/reshard.pt
2022-11-18 00:14:40 | INFO | metaseq.checkpoint_utils | Done reading from disk
2022-11-18 00:14:45 | INFO | metaseq.checkpoint_utils | Done loading state dict
2022-11-18 00:14:46 | INFO | metaseq.cli.interactive | loaded model 0
2022-11-18 00:14:47 | INFO | metaseq.cli.interactive | Worker engaged! 172.21.45.228:6010
 * Serving Flask app 'metaseq.cli.interactive_hosted' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2022-11-18 00:14:47 | INFO | werkzeug | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:6010
 * Running on http://172.21.45.228:6010
2022-11-18 00:14:47 | INFO | werkzeug | Press CTRL+C to quit

$ curl -k http://localhost:6010/completions   -H "Content-Type: application/json"   -H "Authorization: authentic"   -d '{
  "prompt": "Description: A chair, two beds\nItems mentioned: 1 chair, 2 beds.\nDescription: A carpet, four beds\nItems mentioned: 1 carpet, 4 beds.\nDescription: Outside chair leg broken unrepairable/trash left around entire home\nItems mentioned: 1 chair.\nDescription: 3 rugs - 2 kitchen rugs and the living room rug\nItems mentioned: 3 rugs.",
  "temperature": 1.0,
  "max_tokens": 32, "min_tokens": 4,
  "top_p": 0.9, "n": 1,
  "echo": false, "stop": "\n"
}'| jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   935  100   471  100   464    331    326  0:00:01  0:00:01 --:--:--   658
{
  "choices": [
    {
      "logprobs": {
        "finish_reason": "length",
        "text_offset": [
          0,
          3,
          9,
          15,
          22,
          23
        ],
        "token_logprobs": [
          -4.556225776672363,
          -3.6027157306671143,
          -1.8605602979660034,
          -2.4117279052734375,
          -0.19675344228744507,
          -0.05910465121269226
        ],
        "tokens": [
          " No",
          " other",
          " items",
          " listed",
          ".",
          "\n"
        ],
        "top_logprobs": null
      },
      "text": " No other items listed.\n"
    }
  ],
  "created": 1668730514,
  "id": "e856b3cc-d46a-491c-a5ab-c440e6bac510",
  "model": "/home/jason_chou/redspot_home/66b/",
  "object": "text_completion"
}

It seems that we should either instruct people to just use fairseq_v3 branch of Megatron-LM, or roll back #460 and #479.

Expected behavior

metaseq-api-local just works

Environment

metaseq Version: 4e1592ae (current main)
PyTorch Version: 1.12.1+cu113
OS: Ubuntu 18.04.6 LTS
How you installed metaseq: pip
Build command you used (if compiling from source): N.A.
Python version: 3.10
CUDA/cuDNN version: CUDA 11.8
GPU models and configuration: 8 x V100 SXM2 32 GB