Historique des commits

Auteur SHA1 Message Date
  Jay Shah c272619bf5 disable clusters il y a 2 mois
  Jay Shah bb169e2a54 revert to single tile scheduler il y a 2 mois
  Jay Shah 483b26e541 add working fp8 varlen il y a 2 mois
  Son Nguyen 478ee666cc Make namespace comment consistent (#1305) il y a 2 mois
  milesvant c1d146cbd5 Fix copy-paste error in hopper tests (#1279) il y a 3 mois
  jayhshah a5a75274bc FA3 kvcache + split kv + gqa parallelization (#1236) il y a 3 mois
  Tri Dao bedf877467 [CrossEntropy] Fix where labels address not aligned to 16 bytes il y a 3 mois
  rocking 53a4f34163 Hotfix due to change of upstream api (#1239) il y a 4 mois
  hlky 8476986721 Fix FAv3 compilation with MSVC (#1240) il y a 4 mois
  Ying Zhang 9cafd4ae14 Merge pull request #1233 from Dao-AILab/ipiszy/local_attn il y a 4 mois
  Ying Zhang 1c9717d699 address comments il y a 4 mois
  Zhihao Shen 30e1ef0f79 minify torch.torch.int32 to torch.int32 (#1237) il y a 4 mois
  Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139) il y a 4 mois
  Ying Zhang be6c1b98c4 small fixes il y a 4 mois
  Ying Zhang dff976a84a fixes il y a 4 mois
  Ying Zhang 7b4e68e04f hopper local attention il y a 4 mois
  Ying Zhang af314d4006 Merge pull request #1182 from ipiszy/used_q il y a 4 mois
  Ying Zhang 8cbc8a042f small fixes il y a 4 mois
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func il y a 4 mois
  Ying Zhang db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. il y a 4 mois
  rocking e2182cc21d Support page kvcache in AMD ROCm (#1198) il y a 4 mois
  Tri Dao cc1690d9d6 [Rotary] Add test for rotary when qkv are packed an there's GQA il y a 4 mois
  Tri Dao 8c20cfef49 [Rotary] Support qkv block layout from GQA il y a 4 mois
  Charlene Yang bdf733be55 Add q, k, v descales to FA3 interface (#1210) il y a 4 mois
  Tri Dao c7f32a8409 [CrossEntropy] Support precomputed LSE il y a 4 mois
  juejuezi e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206) il y a 4 mois
  Cameron Shinn 3cea2fb6ee Add ArchTag to pre/postprocess bwd kernels (#1180) il y a 4 mois
  jayhshah c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) il y a 4 mois
  Tri Dao d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation il y a 4 mois
  Jay Shah 32792d37ec add missing if condition for key_padding_mask in test_util.py il y a 5 mois