Commit History

Autor SHA1 Mensaxe Data
  Jay Shah c272619bf5 disable clusters hai 1 mes
  Jay Shah bb169e2a54 revert to single tile scheduler hai 1 mes
  Jay Shah 483b26e541 add working fp8 varlen hai 1 mes
  Son Nguyen 478ee666cc Make namespace comment consistent (#1305) hai 1 mes
  milesvant c1d146cbd5 Fix copy-paste error in hopper tests (#1279) hai 2 meses
  jayhshah a5a75274bc FA3 kvcache + split kv + gqa parallelization (#1236) hai 2 meses
  Tri Dao bedf877467 [CrossEntropy] Fix where labels address not aligned to 16 bytes hai 2 meses
  rocking 53a4f34163 Hotfix due to change of upstream api (#1239) hai 2 meses
  hlky 8476986721 Fix FAv3 compilation with MSVC (#1240) hai 2 meses
  Ying Zhang 9cafd4ae14 Merge pull request #1233 from Dao-AILab/ipiszy/local_attn hai 2 meses
  Ying Zhang 1c9717d699 address comments hai 2 meses
  Zhihao Shen 30e1ef0f79 minify torch.torch.int32 to torch.int32 (#1237) hai 2 meses
  Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139) hai 2 meses
  Ying Zhang be6c1b98c4 small fixes hai 2 meses
  Ying Zhang dff976a84a fixes hai 3 meses
  Ying Zhang 7b4e68e04f hopper local attention hai 3 meses
  Ying Zhang af314d4006 Merge pull request #1182 from ipiszy/used_q hai 2 meses
  Ying Zhang 8cbc8a042f small fixes hai 2 meses
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func hai 3 meses
  Ying Zhang db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. hai 3 meses
  rocking e2182cc21d Support page kvcache in AMD ROCm (#1198) hai 3 meses
  Tri Dao cc1690d9d6 [Rotary] Add test for rotary when qkv are packed an there's GQA hai 3 meses
  Tri Dao 8c20cfef49 [Rotary] Support qkv block layout from GQA hai 3 meses
  Charlene Yang bdf733be55 Add q, k, v descales to FA3 interface (#1210) hai 3 meses
  Tri Dao c7f32a8409 [CrossEntropy] Support precomputed LSE hai 3 meses
  juejuezi e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206) hai 3 meses
  Cameron Shinn 3cea2fb6ee Add ArchTag to pre/postprocess bwd kernels (#1180) hai 3 meses
  jayhshah c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) hai 3 meses
  Tri Dao d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation hai 3 meses
  Jay Shah 32792d37ec add missing if condition for key_padding_mask in test_util.py hai 3 meses