コミット履歴

作者 SHA1 メッセージ 日付
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func 3 ヶ月 前
  Tri Dao 299563626f Fix test with alibi and cache_leftpad 4 ヶ月 前
  Tri Dao 751c762c9c Don't specialize for hdim 224 to speed up compilation 4 ヶ月 前
  Phil Wang 5f1ae4a34b backwards for softcapping (#1033) 4 ヶ月 前
  Tri Dao 40e534a7f6 Implement cache_leftpad 5 ヶ月 前
  Tri Dao d0787acc16 Relax dropout_fraction test 5 ヶ月 前
  Tri Dao dca6d89da4 Don't support softcap and dropout at the same time 5 ヶ月 前
  Tri Dao 81e01efd4b More typo fixes 5 ヶ月 前
  Tri Dao 3d41db3e2c Only test backward if there's no softcapping 5 ヶ月 前
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) 5 ヶ月 前
  muoshuosha 6df7e0a02e Fix the varlen deterministic test (#1023) 5 ヶ月 前
  cao lei 6a2a16e994 fix typo (#974) 5 ヶ月 前
  Grigory Sizov f816dee63c Support unpadded LSE layout (#970) 5 ヶ月 前
  Grigory Sizov 2a15840f09 Enable paged attention in varlen forward (#831) 9 ヶ月 前
  Tri Dao 2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada) 9 ヶ月 前
  Tri Dao 54e80a3829 Implement page KV cache 10 ヶ月 前
  Tri Dao 10dad61277 apply_dropout now takes tensor of rowcol layout 11 ヶ月 前
  Tri Dao a7b66ae25a Simplify writing softmax to gmem 11 ヶ月 前
  Tri Dao 732654583c Implement deterministic backward (thanks to Meituan) 11 ヶ月 前
  Tri Dao 5ab9b3667b Clean up alibi, implement non-causal alibi 11 ヶ月 前
  Tri Dao e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache 1 年間 前
  Tri Dao 083e8f525f Implement local attention 1 年間 前
  Tri Dao 65c234ed90 Don't over-allocate dq_accum in case of varlen 1 年間 前
  Tri Dao 2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) 1 年間 前
  Tri Dao 3250ff3d82 Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H) 1 年間 前
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 1 年間 前
  Tri Dao 56b7fc6ee0 Simplify the implementation of KVcache attn by appending KV first 1 年間 前
  Tri Dao 37c6e05406 Implement flash_attn_with_kvcache 1 年間 前
  Tri Dao 0c04943fa2 Require CUDA 11.6+, clean up setup.py 1 年間 前
  Tri Dao b1fbbd8337 Implement splitKV attention 1 年間 前