História revízii

Autor SHA1 Správa Dátum
  Phil Wang 5f1ae4a34b backwards for softcapping (#1033) 6 mesiacov pred
  Tri Dao 40e534a7f6 Implement cache_leftpad 6 mesiacov pred
  Tri Dao 1d536d7de5 Minor cleanup of softcapping 6 mesiacov pred
  Nicolas Patry 8f873cc6ac Implement softcapping. (#1025) 6 mesiacov pred
  Liang ab59ec3590 remove swizzle part of `sV.data()` to get a completely non-swizzle `sVtNoSwizzle` (#984) 6 mesiacov pred
  Grigory Sizov f816dee63c Support unpadded LSE layout (#970) 6 mesiacov pred
  Tri Dao d732be1e67 Update to Cutlass 3.5 7 mesiacov pred
  Tri Dao 656daef4ea Use Cute's local_tile to get gQ, gK, gV 9 mesiacov pred
  ljss 3e9414f1c3 Minor fix in compute_attn_1rowblock_splitkv (#900) 9 mesiacov pred
  Tri Dao 54e80a3829 Implement page KV cache 1 rok pred
  Tri Dao 8f4d82cf5e Update cutlass to v3.4.0 1 rok pred
  Tri Dao 395e5a0dba Move rotary device functions to a separate file 1 rok pred
  Tri Dao 66a127aef8 Refactor masking in fwd pass into 1 object 1 rok pred
  Tri Dao 6f706eff96 Make Softmax an object 1 rok pred
  Tri Dao 4ea866ca19 Make Alibi an object 1 rok pred
  Tri Dao df1418f9db Move softmax_rescale_o to softmax.h 1 rok pred
  Tri Dao 6777336a1c Move masking to a separate file (mask.h) 1 rok pred
  Tri Dao 1274ec3e7e Move dropout to a separate file (dropout.h) 1 rok pred
  Tri Dao 10dad61277 apply_dropout now takes tensor of rowcol layout 1 rok pred
  Tri Dao a7b66ae25a Simplify writing softmax to gmem 1 rok pred
  Tri Dao 8d1b169ed1 Simplify SmemLayoutVtransposed in kernel_traits.h 1 rok pred
  Tri Dao 5ab9b3667b Clean up alibi, implement non-causal alibi 1 rok pred
  Sanghun Cho e4f726fc44 Support alibi, by Sanghun Cho from Kakao Brain 1 rok pred
  Tri Dao db2f80692c Write zero to out / grad if seqlen_q or seqlen_k is zero 1 rok pred
  Tri Dao e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache 1 rok pred
  Tri Dao 083e8f525f Implement local attention 1 rok pred
  Tri Dao 2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) 1 rok pred
  Tri Dao ccbb14f38e Implement rotary embedding in flash_attn_with_kvcache 1 rok pred
  Tri Dao 56b7fc6ee0 Simplify the implementation of KVcache attn by appending KV first 1 rok pred
  Tri Dao bb9beb3645 Remove some unused headers 1 rok pred