Historique des commits

Auteur SHA1 Message Date
  Tri Dao 0519920e23 Deal with the case where q or k/v have length 0 il y a 2 semaines
  Tri Dao 39afd52bd2 Actually fix window_size for bwd pass il y a 2 semaines
  Tri Dao a44cd67d3f Move testing util functions to a separate file il y a 2 semaines
  Tri Dao a609d82315 Change extension name to flash_attn_3_cuda il y a 2 semaines
  Tri Dao f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 il y a 3 semaines
  Tri Dao 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 il y a 3 semaines
  Tri Dao 51484a7b56 Make backward epilogue work for Sm80 il y a 3 semaines
  Tri Dao 69bd392159 Merge bwd and bwd_varlen in the C++ API il y a 1 mois
  Tri Dao c3cdc0fd88 Add sm_margin as an option for overlapping with communication il y a 1 mois
  Tri Dao 7f5d73a162 Add env var to disable specific hdim il y a 1 mois
  Tri Dao 234c557190 Fix kvcache test in the case with cu_seqlens_k_new il y a 1 mois
  Tri Dao ba2061dfe8 Support cu_seqlens_k_new in flash_attn_with_kvcache il y a 1 mois
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal il y a 1 mois
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) il y a 1 mois
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd il y a 1 mois
  Tri Dao 88fdffc16e Fix test for softcap FP8 il y a 1 mois
  Tri Dao f5e89ff136 Tune tile size for bwd softcap il y a 1 mois
  Tri Dao 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs il y a 1 mois
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen il y a 1 mois
  Tri Dao 199c82052c Fix test for has_batch_idx il y a 1 mois
  Tri Dao 42fc4962f0 Uncomment tanh softcapping il y a 2 mois
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features il y a 2 mois
  Tri Dao c9c40eba83 Uncomment local attn il y a 2 mois
  Tri Dao 94657af3e8 Add option for not doing intra-WG overlapping of gemm and softmax il y a 2 mois
  Tri Dao f0b5a6ec4c Wait for barrier_O at load_tail to avoid Cluster error il y a 2 mois
  Tri Dao fc2fd95a18 Renable FP8 kernels il y a 2 mois
  Tri Dao 3d0d147940 Early stop on actual num_splits in mha_combine kernel il y a 2 mois
  Tri Dao 1dc3364774 Consolidate seqlen info into a struct il y a 2 mois
  Tri Dao 0c49ac9a07 Implement rotary non-interleaved il y a 2 mois
  Tri Dao 9f82a326ad Implement rotary for attn decode il y a 2 mois