Commit History

Author SHA1 Message Date
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal 10 hours ago
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) 2 days ago
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd 1 week ago
  Tri Dao 88fdffc16e Fix test for softcap FP8 1 week ago
  Tri Dao f5e89ff136 Tune tile size for bwd softcap 1 week ago
  Tri Dao 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs 1 week ago
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen 1 week ago
  Tri Dao 199c82052c Fix test for has_batch_idx 2 weeks ago
  Tri Dao 42fc4962f0 Uncomment tanh softcapping 2 weeks ago
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features 2 weeks ago
  Tri Dao c9c40eba83 Uncomment local attn 2 weeks ago
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax 3 weeks ago
  Tri Dao f0b5a6ec4c Wait for barrier_O at load_tail to avoid Cluster error 3 weeks ago
  Tri Dao fc2fd95a18 Renable FP8 kernels 3 weeks ago
  Tri Dao 3d0d147940 Early stop on actual num_splits in mha_combine kernel 3 weeks ago
  Tri Dao 1dc3364774 Consolidate seqlen info into a struct 3 weeks ago
  Tri Dao 0c49ac9a07 Implement rotary non-interleaved 3 weeks ago
  Tri Dao 9f82a326ad Implement rotary for attn decode 4 weeks ago
  Tri Dao 4d00645c76 Implement appending new KV to KV cache 1 month ago
  Tri Dao 82c1aa3514 Move PackGQA epilogue code to pack_gqa.h 1 month ago
  Tri Dao df96486c31 Decode: varlen, paged KV, leftpad 1 month ago
  Tri Dao ea7a98f15d Fix backward with softcap 2 months ago
  Tri Dao 6e8b25e426 Refactor 2 months ago
  Ying Zhang be6c1b98c4 small fixes 3 months ago
  Ying Zhang dff976a84a fixes 3 months ago
  Ying Zhang 8cbc8a042f small fixes 3 months ago
  Ying Zhang db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. 3 months ago
  jayhshah c92ca63268 FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) 3 months ago
  Ying Zhang 53537da422 add a unittest 4 months ago
  Tri Dao c33de664a1 Fix import in test 4 months ago