Create generation benchmarks on different model sizes
Created by: punitkoura
🚀 Feature Request
The goal of this task is to track the generation speed (in WPS) of the various OPT model sizes. We can also include profiling to identify potential bottlenecks in the generation process. This investigation would help us speed up generation.
Motivation
To improve the speed of text generation from OPT models.
Pitch
Create a generation benchmark that takes in any of our models, and runs a fixed generation, and reports timing per token. Model name should be configurable.
Alternatives
N/A
Additional context
N/A