david/flash-attention

mirror of https://github.com/Dao-AILab/flash-attention

Author	SHA1 Message	Date
Tri Dao	ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache	1 year ago
Tri Dao	bb9beb3645 Remove some unused headers	1 year ago
Tri Dao	ee77b931b9 Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza)	1 year ago
Tri Dao	37c6e05406 Implement flash_attn_with_kvcache	1 year ago
Tri Dao	b1fbbd8337 Implement splitKV attention	1 year ago
Kirthi Shankar Sivamani	a03f6f8e9e Enable CUDA graphs (#386)	1 year ago
Tri Dao	4f285b3547 FlashAttention-2 release	1 year ago