Created by: tangbinh
Summary
Add a new script to collect latency results for OPT models during generation. While this script resembles the existing one metaseq/scripts/generation_benchmarks.py
, it's a bit more general where besides latency, we also collect memory usage and GPU traces for various configurations of batch size, input length, and output length. We also use the GeneratorInterface
directly and skip the checkpoint downloading part.