Commit History

Author SHA1 Message Date
  Tri Dao 448ac57039 Try persistent scheduler with backward 3 weeks ago
  Tri Dao 0eb8f680a0 Fix env var to disable hdims 3 weeks ago
  Tri Dao 7f5d73a162 Add env var to disable specific hdim 3 weeks ago
  Tri Dao 3e5d77a102 Group instantiations for different hdims together 3 weeks ago
  Tri Dao 234c557190 Fix kvcache test in the case with cu_seqlens_k_new 3 weeks ago
  Tri Dao ba2061dfe8 Support cu_seqlens_k_new in flash_attn_with_kvcache 3 weeks ago
  Tri Dao 6807b1ea37 Longest-processing-time-first scheduler for causal 3 weeks ago
  Tri Dao fb9c9cbbe9 Support qkv_descale of shape (batch_size, nheads_kv) 3 weeks ago
  Tri Dao 6293008748 Add option for Mma0_is_RS and Mma1_is_RS in attn fwd 1 month ago
  Tri Dao ae3c1fb3e0 Simplify bwd by setting NumdQWarpGroups = NumMmaWarpGroups 1 month ago
  Tri Dao 2c996ca25f Use SeqlenInfo for bwd and epilogue 1 month ago
  Tri Dao 88fdffc16e Fix test for softcap FP8 1 month ago
  Tri Dao f5e89ff136 Tune tile size for bwd softcap 1 month ago
  Tri Dao 3b6ac2b954 Use compile time constants in local mask 1 month ago
  Tri Dao bfbaafd043 Fix bwd reading out of out LSE 1 month ago
  Tri Dao 29cdfedd80 Use Bulk reduce instead of TMA for dQaccum, split across WGs 1 month ago
  Tri Dao 9c954f7021 Use num_split_heuristics in fwd and fwd_varlen 1 month ago
  Tri Dao 314b9edfc0 Don't need to link to cuda lib anymore 1 month ago
  Tri Dao f6e165becf Change tile_size and local to avoid wgmma being serialized 1 month ago
  Tri Dao f11624b746 Disable --split-compile due to ptxas register allocation failure 1 month ago
  Tri Dao e8a1edbeb2 Clean up some #include 1 month ago
  Tri Dao 8ae77ea17c Download nvcc 12.3 to compile for best perf 1 month ago
  Tri Dao 199c82052c Fix test for has_batch_idx 1 month ago
  Tri Dao 3ed79742fb Add option to shuffle LSE and dPsum in the bwd 1 month ago
  Tri Dao 42fc4962f0 Uncomment tanh softcapping 1 month ago
  Tri Dao 6bc55b571c Use --split-compile to speed up compilation 1 month ago
  Tri Dao 82dc825759 Don't use the unsafe convert_type function 1 month ago
  Tri Dao 9553b2728f More env vars to disable features 1 month ago
  Tri Dao 3248babb9e QOL: Use env var to selectively disable features 1 month ago
  Tri Dao c9c40eba83 Uncomment local attn 1 month ago