Commit History

Author SHA1 Message Date
  Tri Dao 0519920e23 Deal with the case where q or k/v have length 0 2 weeks ago
  Tri Dao 39afd52bd2 Actually fix window_size for bwd pass 2 weeks ago
  Tri Dao a44cd67d3f Move testing util functions to a separate file 2 weeks ago
  Tri Dao a609d82315 Change extension name to flash_attn_3_cuda 2 weeks ago
  Tri Dao f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 2 weeks ago
  Tri Dao 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 3 weeks ago
  Tri Dao 51484a7b56 Make backward epilogue work for Sm80 3 weeks ago
  Tri Dao 69bd392159 Merge bwd and bwd_varlen in the C++ API 1 month ago
  Tri Dao c3cdc0fd88 Add sm_margin as an option for overlapping with communication 1 month ago
  Tri Dao 7f5d73a162 Add env var to disable specific hdim 1 month ago
  Tri Dao 234c557190 Fix kvcache test in the case with cu_seqlens_k_new 1 month ago
  Tri Dao ba2061dfe8 Support cu_seqlens_k_new in flash_attn_with_kvcache 1 month ago
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal 1 month ago
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) 1 month ago
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd 1 month ago
  Tri Dao 88fdffc16e Fix test for softcap FP8 1 month ago
  Tri Dao f5e89ff136 Tune tile size for bwd softcap 1 month ago
  Tri Dao 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs 1 month ago
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen 1 month ago
  Tri Dao 199c82052c Fix test for has_batch_idx 1 month ago
  Tri Dao 42fc4962f0 Uncomment tanh softcapping 2 months ago
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features 2 months ago
  Tri Dao c9c40eba83 Uncomment local attn 2 months ago
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax 2 months ago
  Tri Dao f0b5a6ec4c Wait for barrier_O at load_tail to avoid Cluster error 2 months ago
  Tri Dao fc2fd95a18 Renable FP8 kernels 2 months ago
  Tri Dao 3d0d147940 Early stop on actual num_splits in mha_combine kernel 2 months ago
  Tri Dao 1dc3364774 Consolidate seqlen info into a struct 2 months ago
  Tri Dao 0c49ac9a07 Implement rotary non-interleaved 2 months ago
  Tri Dao 9f82a326ad Implement rotary for attn decode 2 months ago