Attention kernel from FasterTransformer

This CUDA extension wraps the single-query attention kernel from FasterTransformer v5.2.1 for benchmarking purpose.

cd csrc/ft_attention && pip install .

As of 2023-09-17, this extension is no longer used in the FlashAttention repo. FlashAttention now has implemented flash_attn_with_kvcache with all the features of this ft_attention kernel (and more).

README.md 696 B Постійне посилання Історія Запис

Attention kernel from FasterTransformer

README.md 696 B

Постійне посилання Історія Запис