david/flash-attention: flash-attention from https://github.com/Dao-AILab/flash-attention @ 0dfb28174333d9eefb7c1dd4292690a8458d1e89

This CUDA extension implements fused dropout + residual + LayerNorm, building on Apex's FastLayerNorm. Major changes:

If you want to use it for dimensions larger than 8k, please file an issue.

This extension has only been tested on A100s.

cd csrc/layer_norm && pip install .

As of 2024-01-05, this extension is no longer used in the FlashAttention repo. We've instead switched to a Triton-based implementation.