Created by: stephenroller
Patch Description For a few weeks now, the API has consistently assigned the first logit after the prompt to be positive, which makes it an invalid probability.
Digging into it, it's because beam search keeps track of cumulative NLL (which makes sense for beam search). However, the first step of the beam search was being provided logits only for the newest token. As a result, the cumulative logit logic was assigning a "reset to 0" offset on the first one.
While this is a little bit kludgy, adding in a new one-time-only parameter, it's the best way to provide this information to the search algorithm without gutting it all.
This also makes the writing into the scores slightly more compact.
Note that generations do not change compared to previously. Just the bookkeeping of logits.
Testing steps Generations with and without topp; with and without batching. Confirmed generations for greedy stay the same.