Commit History

Author SHA1 Message Date
  Ying Zhang 1c9717d699 address comments 2 months ago
  Ying Zhang be6c1b98c4 small fixes 2 months ago
  Ying Zhang dff976a84a fixes 3 months ago
  Ying Zhang 7b4e68e04f hopper local attention 3 months ago
  Ying Zhang af314d4006 Merge pull request #1182 from ipiszy/used_q 2 months ago
  Ying Zhang 8cbc8a042f small fixes 2 months ago
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func 3 months ago
  Ying Zhang db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. 3 months ago
  rocking e2182cc21d Support page kvcache in AMD ROCm (#1198) 3 months ago
  Tri Dao cc1690d9d6 [Rotary] Add test for rotary when qkv are packed an there's GQA 3 months ago
  Tri Dao 8c20cfef49 [Rotary] Support qkv block layout from GQA 3 months ago
  Charlene Yang bdf733be55 Add q, k, v descales to FA3 interface (#1210) 3 months ago
  Tri Dao c7f32a8409 [CrossEntropy] Support precomputed LSE 3 months ago
  juejuezi e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206) 3 months ago
  Cameron Shinn 3cea2fb6ee Add ArchTag to pre/postprocess bwd kernels (#1180) 3 months ago
  jayhshah c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) 3 months ago
  Tri Dao d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation 3 months ago
  Jay Shah 32792d37ec add missing if condition for key_padding_mask in test_util.py 3 months ago
  Ying Zhang 28e7f4ddbd Merge pull request #1155 from ipiszy/fix 3 months ago
  Ying Zhang 53537da422 add a unittest 3 months ago
  Ying Zhang a3a257c71d Fix out-of-bound writes for var-seq-len zero-length KVs 4 months ago
  Tri Dao bcd918f275 [LayerNorm] Add option to write result to out and residual_out 4 months ago
  Tri Dao bd82d6c6eb Revert "[LayerNorm] Don't store x + residual if we don't need gradients" 4 months ago
  Tri Dao 800401847e [LayerNorm] Don't store x + residual if we don't need gradients 4 months ago
  Garrett Byrd 16025d8cc9 Clearer install instructions for CUDA and ROCm backends (#1147) 4 months ago
  Ying Zhang 3669b25206 bwd benchmark + small fixes (#1129) 4 months ago
  Tri Dao 5d5bfbb619 Remove contiguous checks 4 months ago
  SueJane 3f1b4d38e7 Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127) 4 months ago
  Tri Dao 3f6ff1c1c5 Remove struct : cute::aligned_struct to avoid error with gcc 12 4 months ago
  Tri Dao c33de664a1 Fix import in test 4 months ago