Historique des commits

Auteur SHA1 Message Date
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func il y a 3 mois
  Tri Dao 299563626f Fix test with alibi and cache_leftpad il y a 4 mois
  Tri Dao 751c762c9c Don't specialize for hdim 224 to speed up compilation il y a 4 mois
  Phil Wang 5f1ae4a34b backwards for softcapping (#1033) il y a 4 mois
  Tri Dao 40e534a7f6 Implement cache_leftpad il y a 5 mois
  Tri Dao d0787acc16 Relax dropout_fraction test il y a 5 mois
  Tri Dao dca6d89da4 Don't support softcap and dropout at the same time il y a 5 mois
  Tri Dao 81e01efd4b More typo fixes il y a 5 mois
  Tri Dao 3d41db3e2c Only test backward if there's no softcapping il y a 5 mois
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) il y a 5 mois
  muoshuosha 6df7e0a02e Fix the varlen deterministic test (#1023) il y a 5 mois
  cao lei 6a2a16e994 fix typo (#974) il y a 5 mois
  Grigory Sizov f816dee63c Support unpadded LSE layout (#970) il y a 5 mois
  Grigory Sizov 2a15840f09 Enable paged attention in varlen forward (#831) il y a 9 mois
  Tri Dao 2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada) il y a 9 mois
  Tri Dao 54e80a3829 Implement page KV cache il y a 10 mois
  Tri Dao 10dad61277 apply_dropout now takes tensor of rowcol layout il y a 11 mois
  Tri Dao a7b66ae25a Simplify writing softmax to gmem il y a 11 mois
  Tri Dao 732654583c Implement deterministic backward (thanks to Meituan) il y a 11 mois
  Tri Dao 5ab9b3667b Clean up alibi, implement non-causal alibi il y a 1 an
  Tri Dao e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache il y a 1 an
  Tri Dao 083e8f525f Implement local attention il y a 1 an
  Tri Dao 65c234ed90 Don't over-allocate dq_accum in case of varlen il y a 1 an
  Tri Dao 2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) il y a 1 an
  Tri Dao 3250ff3d82 Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H) il y a 1 an
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache il y a 1 an
  Tri Dao 56b7fc6ee0 Simplify the implementation of KVcache attn by appending KV first il y a 1 an
  Tri Dao 37c6e05406 Implement flash_attn_with_kvcache il y a 1 an
  Tri Dao 0c04943fa2 Require CUDA 11.6+, clean up setup.py il y a 1 an
  Tri Dao b1fbbd8337 Implement splitKV attention il y a 1 an