moving future_mask to cuda for document attenion (!290) · Merge requests · Administrator / metaseq

Merged Administrator requested to merge kchakrabarty/docattentionspeedup into main Aug 04, 2022

Created by: KUNAL1612

Patch Description Moved the future mask to CUDA so that all operations for document attention also take place on CUDA. Refer to issue #285 for context. The speedup offered by this is marginal but over multiple runs may cumulate.

To test, I set attn doc seperator to a random number to trigger the code to enter the branch, and timed it using the methods used in #220 (closed). Observed that over 436 calls to the function on CPU it takes 0.42 seconds vs 0.09 seconds on GPU.