Commit Verlauf

Autor SHA1 Nachricht Datum
  Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139) vor 3 Monaten
  youkaichao ef3e358a25 remove lambda (#1056) vor 5 Monaten
  Tri Dao 898dd4bbf2 Pass seqused_k to _flash_attn_varlen_forward vor 5 Monaten
  Tri Dao 40e534a7f6 Implement cache_leftpad vor 6 Monaten
  Tri Dao 81e01efd4b More typo fixes vor 6 Monaten
  Tri Dao 72e27c6320 Fix typo with softcapping vor 6 Monaten
  Phil Wang f4628b43ec missing commas and backwards return arguments (#1032) vor 6 Monaten
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) vor 6 Monaten
  Jianwei Dong 4e8d60069f Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989) vor 6 Monaten
  Grigory Sizov f816dee63c Support unpadded LSE layout (#970) vor 6 Monaten
  Grigory Sizov 2a15840f09 Enable paged attention in varlen forward (#831) vor 9 Monaten
  Tao He 204c3c6d1b Fixes an error in comment (#785) vor 11 Monaten
  Tri Dao 54e80a3829 Implement page KV cache vor 11 Monaten
  Tri Dao a7b66ae25a Simplify writing softmax to gmem vor 1 Jahr
  Tri Dao 732654583c Implement deterministic backward (thanks to Meituan) vor 1 Jahr
  Tri Dao 5ab9b3667b Clean up alibi, implement non-causal alibi vor 1 Jahr
  Tri Dao bc28eacc60 Format flash_attn_interface.py vor 1 Jahr
  Sanghun Cho e4f726fc44 Support alibi, by Sanghun Cho from Kakao Brain vor 1 Jahr
  Tri Dao d4a7c8ffbb [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly vor 1 Jahr
  Jeremy Reizenstein ce3e7280f8 Allow varlen_fwd to take optional seqused_k (#647) vor 1 Jahr
  Tri Dao e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache vor 1 Jahr
  Tri Dao 083e8f525f Implement local attention vor 1 Jahr
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache vor 1 Jahr
  Tri Dao ee77b931b9 Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza) vor 1 Jahr
  Tri Dao fd20f16a4e Support cache_seqlens being integer vor 1 Jahr
  Tri Dao 37c6e05406 Implement flash_attn_with_kvcache vor 1 Jahr
  Tri Dao 9e5e8bc91e Change causal mask to be aligned to bottom-right instead of top-left vor 1 Jahr
  Tri Dao d431f16751 Import torch before flash_attn_2_cuda vor 1 Jahr
  Tri Dao f1a73d0740 Run isort and black on python files vor 1 Jahr
  Tri Dao 8f4cd4c16b [Docs] Fix docstring about Q nheads being divisible by KV nheads vor 1 Jahr