Created by: stephenroller
Patch Description Leaving mostly for record keeping, as experiments here were unsuccessful. Adds support for learned alibi embeddings, and in a previous version, simple offset lookups. Includes necessary changes to propagate gradients through learned positional offsets.
Testing steps
Ran a 7B run with --alibi
.