Commit History

Author SHA1 Message Date
  Tri Dao 39afd52bd2 Actually fix window_size for bwd pass 3 days ago
  Tri Dao a44cd67d3f Move testing util functions to a separate file 3 days ago
  Tri Dao 82abd8daca Don't disable window size when is_causal==true for bwd pass 3 days ago
  Tri Dao a609d82315 Change extension name to flash_attn_3_cuda 3 days ago
  Tri Dao 5acb532214 Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal 3 days ago
  Tri Dao 0890032358 Implement backward pass for Sm80 4 days ago
  Tri Dao a53f7380b6 Don't disable window_size if is_causal=true 4 days ago
  Tri Dao f907a13187 Tune tile sizes for fwd varlen on Sm80 and Sm86 4 days ago
  Tri Dao 76f14c61c9 Tune fwd tile sizes for Sm86 and Sm89 4 days ago
  Tri Dao c4c624f868 Rename bwd epilogue file 1 week ago
  Tri Dao 51484a7b56 Make backward epilogue work for Sm80 1 week ago
  Tri Dao 14894c5717 Make BwdPostprocessKernel work with Sm80 1 week ago
  Tri Dao 659a631f4c Rename bwd classes to include Sm90 suffix 1 week ago
  Tri Dao 1fba7b499f Merge mha_fwd, mha_varlen_fwd, mha_fwd_kvcache C++ interface 1 week ago
  Tri Dao a901c7eeda Make Sm80 forward pass work with persistent scheduler 1 week ago
  Tri Dao 65a0f59ef5 Change CP_ASYNC_CACHEGLOBAL to CP_ASYNC_CACHEGLOBAL_ZFILL for compat 1 week ago
  Tri Dao b16d814c62 Revert to before Cutlass 3.6.0 update to investigate perf issue 1 week ago
  Tri Dao 2ba29df99e Fix hanging when using AppendKV with persistent scheduler 1 week ago
  Tri Dao 8ec230f833 Fix to compile with Cutlass 3.6.0 1 week ago
  Tri Dao 64e6e0a09d Switch to Cutlass 3.6.0 official release 2 weeks ago
  Tri Dao c93451d5f8 Fix causal using n_block_min instead of n_block_min_causal_local_mas 2 weeks ago
  Tri Dao 6863fde13f Fix bug in paged KV overshooting kBlockN in smem 2 weeks ago
  Tri Dao 5171269dab Implement forward pass for Sm80 2 weeks ago
  Tri Dao da264e5742 Change file names and class names to include sm90 suffix 2 weeks ago
  Tri Dao 111ee9d478 Add back gemm_sm80 to utils, make copy work with has_with_bool 2 weeks ago
  Tri Dao 5f25b9781f Make epilogue_fwd work for Ampere 2 weeks ago
  Tri Dao 69bd392159 Merge bwd and bwd_varlen in the C++ API 2 weeks ago
  Tri Dao c3cdc0fd88 Add sm_margin as an option for overlapping with communication 2 weeks ago
  Tri Dao 3f85126149 Use persistent scheduler when paged_kv 2 weeks ago
  Tri Dao 147ac33a2e Tune num_splits for local, don't split when num_n_blocks is small 2 weeks ago