Created by: lilisierrayu
Patch Description Add bunch of warnings, when generation arguments are rested or ignored. When running evaluation using API, this silently changes evaluation configuration and causes issues. Including:
- For robust inference (especially avoiding OOM), interactive_hosted set beam to MAX_BEAM when beam is too big.
- truncate input length to fix in max_positions.
- Topk decoding is removed , and sampling_topk ignored.
Testing steps Describe how you tested your changes