Create generation benchmarks on different model sizes

Created by: punitkoura

🚀 Feature Request

The goal of this task is to track the generation speed (in WPS) of the various OPT model sizes. We can also include profiling to identify potential bottlenecks in the generation process. This investigation would help us speed up generation.

Motivation

To improve the speed of text generation from OPT models.

Pitch

Create a generation benchmark that takes in any of our models, and runs a fixed generation, and reports timing per token. Model name should be configurable.

Alternatives

N/A

Additional context

N/A