Historique des commits

Auteur SHA1 Message Date
  XiaobingZhang 0dfb281743 don't save inputs buffer of FlashAttenFunc to reduce memory usage for inference mode (#1383) il y a 2 jours
  Michael Melesse b518517cb8 [AMD] Triton Backend for ROCm (#1203) il y a 1 semaine
  Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139) il y a 2 mois
  youkaichao ef3e358a25 remove lambda (#1056) il y a 4 mois
  Tri Dao 898dd4bbf2 Pass seqused_k to _flash_attn_varlen_forward il y a 5 mois
  Tri Dao 40e534a7f6 Implement cache_leftpad il y a 5 mois
  Tri Dao 81e01efd4b More typo fixes il y a 5 mois
  Tri Dao 72e27c6320 Fix typo with softcapping il y a 5 mois
  Phil Wang f4628b43ec missing commas and backwards return arguments (#1032) il y a 5 mois
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) il y a 5 mois
  Jianwei Dong 4e8d60069f Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989) il y a 5 mois
  Grigory Sizov f816dee63c Support unpadded LSE layout (#970) il y a 5 mois
  Grigory Sizov 2a15840f09 Enable paged attention in varlen forward (#831) il y a 9 mois
  Tao He 204c3c6d1b Fixes an error in comment (#785) il y a 10 mois
  Tri Dao 54e80a3829 Implement page KV cache il y a 10 mois
  Tri Dao a7b66ae25a Simplify writing softmax to gmem il y a 11 mois
  Tri Dao 732654583c Implement deterministic backward (thanks to Meituan) il y a 11 mois
  Tri Dao 5ab9b3667b Clean up alibi, implement non-causal alibi il y a 11 mois
  Tri Dao bc28eacc60 Format flash_attn_interface.py il y a 1 an
  Sanghun Cho e4f726fc44 Support alibi, by Sanghun Cho from Kakao Brain il y a 1 an
  Tri Dao d4a7c8ffbb [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly il y a 1 an
  Jeremy Reizenstein ce3e7280f8 Allow varlen_fwd to take optional seqused_k (#647) il y a 1 an
  Tri Dao e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache il y a 1 an
  Tri Dao 083e8f525f Implement local attention il y a 1 an
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache il y a 1 an
  Tri Dao ee77b931b9 Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza) il y a 1 an
  Tri Dao fd20f16a4e Support cache_seqlens being integer il y a 1 an
  Tri Dao 37c6e05406 Implement flash_attn_with_kvcache il y a 1 an
  Tri Dao 9e5e8bc91e Change causal mask to be aligned to bottom-right instead of top-left il y a 1 an
  Tri Dao d431f16751 Import torch before flash_attn_2_cuda il y a 1 an