Commit History

Author SHA1 Message Date
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 1 year ago
  Tri Dao bb9beb3645 Remove some unused headers 1 year ago
  Tri Dao ee77b931b9 Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza) 1 year ago
  Tri Dao 37c6e05406 Implement flash_attn_with_kvcache 1 year ago
  Tri Dao b1fbbd8337 Implement splitKV attention 1 year ago
  Kirthi Shankar Sivamani a03f6f8e9e Enable CUDA graphs (#386) 1 year ago
  Tri Dao 4f285b3547 FlashAttention-2 release 1 year ago