Commit History

Author SHA1 Message Date
  Jay Shah c272619bf5 disable clusters 1 month ago
  Jay Shah bb169e2a54 revert to single tile scheduler 1 month ago
  Jay Shah 483b26e541 add working fp8 varlen 1 month ago
  Son Nguyen 478ee666cc Make namespace comment consistent (#1305) 1 month ago
  milesvant c1d146cbd5 Fix copy-paste error in hopper tests (#1279) 2 months ago
  jayhshah a5a75274bc FA3 kvcache + split kv + gqa parallelization (#1236) 2 months ago
  Tri Dao bedf877467 [CrossEntropy] Fix where labels address not aligned to 16 bytes 2 months ago
  rocking 53a4f34163 Hotfix due to change of upstream api (#1239) 2 months ago
  hlky 8476986721 Fix FAv3 compilation with MSVC (#1240) 2 months ago
  Ying Zhang 9cafd4ae14 Merge pull request #1233 from Dao-AILab/ipiszy/local_attn 2 months ago
  Ying Zhang 1c9717d699 address comments 2 months ago
  Zhihao Shen 30e1ef0f79 minify torch.torch.int32 to torch.int32 (#1237) 2 months ago
  Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139) 2 months ago
  Ying Zhang be6c1b98c4 small fixes 3 months ago
  Ying Zhang dff976a84a fixes 3 months ago
  Ying Zhang 7b4e68e04f hopper local attention 3 months ago
  Ying Zhang af314d4006 Merge pull request #1182 from ipiszy/used_q 3 months ago
  Ying Zhang 8cbc8a042f small fixes 3 months ago
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func 3 months ago
  Ying Zhang db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. 3 months ago
  rocking e2182cc21d Support page kvcache in AMD ROCm (#1198) 3 months ago
  Tri Dao cc1690d9d6 [Rotary] Add test for rotary when qkv are packed an there's GQA 3 months ago
  Tri Dao 8c20cfef49 [Rotary] Support qkv block layout from GQA 3 months ago
  Charlene Yang bdf733be55 Add q, k, v descales to FA3 interface (#1210) 3 months ago
  Tri Dao c7f32a8409 [CrossEntropy] Support precomputed LSE 3 months ago
  juejuezi e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206) 3 months ago
  Cameron Shinn 3cea2fb6ee Add ArchTag to pre/postprocess bwd kernels (#1180) 3 months ago
  jayhshah c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) 3 months ago
  Tri Dao d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation 3 months ago
  Jay Shah 32792d37ec add missing if condition for key_padding_mask in test_util.py 3 months ago