Commit History

Author SHA1 Message Date
  Tri Dao 26a6e0a048 Switch to Cutlass 3.6.0 1 month ago
  Tri Dao ea7a98f15d Fix backward with softcap 2 months ago
  Tri Dao 6e8b25e426 Refactor 2 months ago
  Tri Dao 8eef9487e8 Switch to cutlass branch with m64n40k16 gmma instruction 2 months ago
  Tri Dao bedf877467 [CrossEntropy] Fix where labels address not aligned to 16 bytes 2 months ago
  rocking 53a4f34163 Hotfix due to change of upstream api (#1239) 2 months ago
  hlky 8476986721 Fix FAv3 compilation with MSVC (#1240) 2 months ago
  Ying Zhang 9cafd4ae14 Merge pull request #1233 from Dao-AILab/ipiszy/local_attn 2 months ago
  Ying Zhang 1c9717d699 address comments 2 months ago
  Zhihao Shen 30e1ef0f79 minify torch.torch.int32 to torch.int32 (#1237) 2 months ago
  Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139) 2 months ago
  Ying Zhang be6c1b98c4 small fixes 3 months ago
  Ying Zhang dff976a84a fixes 3 months ago
  Ying Zhang 7b4e68e04f hopper local attention 3 months ago
  Ying Zhang af314d4006 Merge pull request #1182 from ipiszy/used_q 3 months ago
  Ying Zhang 8cbc8a042f small fixes 3 months ago
  Ying Zhang cdbbe844b1 minor changes to unpad_input test util func 3 months ago
  Ying Zhang db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. 3 months ago
  rocking e2182cc21d Support page kvcache in AMD ROCm (#1198) 3 months ago
  Tri Dao cc1690d9d6 [Rotary] Add test for rotary when qkv are packed an there's GQA 3 months ago
  Tri Dao 8c20cfef49 [Rotary] Support qkv block layout from GQA 3 months ago
  Charlene Yang bdf733be55 Add q, k, v descales to FA3 interface (#1210) 3 months ago
  Tri Dao c7f32a8409 [CrossEntropy] Support precomputed LSE 3 months ago
  juejuezi e371bea04f feat: change minimal supported CUDA version to 11.7 (#1206) 3 months ago
  Cameron Shinn 3cea2fb6ee Add ArchTag to pre/postprocess bwd kernels (#1180) 3 months ago
  jayhshah c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) 3 months ago
  Tri Dao d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation 3 months ago
  Jay Shah 32792d37ec add missing if condition for key_padding_mask in test_util.py 3 months ago
  Ying Zhang 28e7f4ddbd Merge pull request #1155 from ipiszy/fix 4 months ago
  Ying Zhang 53537da422 add a unittest 4 months ago