Created by: stephenroller
Patch Description Incremental decoding is currently broken in the original version of MHA, but not the model parallel version. This fix was applied at some point to the MP-MHA, but it got lost in the squash history. However, I didn't also implement it in regular MHA.
Testing steps Generating on cm3 branch no longer crashes when using a consolidated checkpoint.