Historique des commits

Auteur SHA1 Message Date
  Phil Wang 5f1ae4a34b backwards for softcapping (#1033) il y a 6 mois
  Tri Dao 40e534a7f6 Implement cache_leftpad il y a 6 mois
  Tri Dao 1d536d7de5 Minor cleanup of softcapping il y a 6 mois
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) il y a 6 mois
  Liang ab59ec3590 remove swizzle part of `sV.data()` to get a completely non-swizzle `sVtNoSwizzle` (#984) il y a 6 mois
  Grigory Sizov f816dee63c Support unpadded LSE layout (#970) il y a 6 mois
  Tri Dao d732be1e67 Update to Cutlass 3.5 il y a 7 mois
  Tri Dao 656daef4ea Use Cute's local_tile to get gQ, gK, gV il y a 9 mois
  ljss 3e9414f1c3 Minor fix in compute_attn_1rowblock_splitkv (#900) il y a 9 mois
  Tri Dao 54e80a3829 Implement page KV cache il y a 1 an
  Tri Dao 8f4d82cf5e Update cutlass to v3.4.0 il y a 1 an
  Tri Dao 395e5a0dba Move rotary device functions to a separate file il y a 1 an
  Tri Dao 66a127aef8 Refactor masking in fwd pass into 1 object il y a 1 an
  Tri Dao 6f706eff96 Make Softmax an object il y a 1 an
  Tri Dao 4ea866ca19 Make Alibi an object il y a 1 an
  Tri Dao df1418f9db Move softmax_rescale_o to softmax.h il y a 1 an
  Tri Dao 6777336a1c Move masking to a separate file (mask.h) il y a 1 an
  Tri Dao 1274ec3e7e Move dropout to a separate file (dropout.h) il y a 1 an
  Tri Dao 10dad61277 apply_dropout now takes tensor of rowcol layout il y a 1 an
  Tri Dao a7b66ae25a Simplify writing softmax to gmem il y a 1 an
  Tri Dao 8d1b169ed1 Simplify SmemLayoutVtransposed in kernel_traits.h il y a 1 an
  Tri Dao 5ab9b3667b Clean up alibi, implement non-causal alibi il y a 1 an
  Sanghun Cho e4f726fc44 Support alibi, by Sanghun Cho from Kakao Brain il y a 1 an
  Tri Dao db2f80692c Write zero to out / grad if seqlen_q or seqlen_k is zero il y a 1 an
  Tri Dao e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache il y a 1 an
  Tri Dao 083e8f525f Implement local attention il y a 1 an
  Tri Dao 2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) il y a 1 an
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache il y a 1 an
  Tri Dao 56b7fc6ee0 Simplify the implementation of KVcache attn by appending KV first il y a 1 an
  Tri Dao bb9beb3645 Remove some unused headers il y a 1 an