Tri Dao
|
0890032358
Implement backward pass for Sm80
|
hace 4 días |
Tri Dao
|
a53f7380b6
Don't disable window_size if is_causal=true
|
hace 4 días |
Tri Dao
|
f907a13187
Tune tile sizes for fwd varlen on Sm80 and Sm86
|
hace 4 días |
Tri Dao
|
76f14c61c9
Tune fwd tile sizes for Sm86 and Sm89
|
hace 4 días |
Tri Dao
|
c4c624f868
Rename bwd epilogue file
|
hace 1 semana |
Tri Dao
|
51484a7b56
Make backward epilogue work for Sm80
|
hace 1 semana |
Tri Dao
|
14894c5717
Make BwdPostprocessKernel work with Sm80
|
hace 1 semana |
Tri Dao
|
659a631f4c
Rename bwd classes to include Sm90 suffix
|
hace 1 semana |
Tri Dao
|
1fba7b499f
Merge mha_fwd, mha_varlen_fwd, mha_fwd_kvcache C++ interface
|
hace 1 semana |
Tri Dao
|
a901c7eeda
Make Sm80 forward pass work with persistent scheduler
|
hace 1 semana |
Tri Dao
|
65a0f59ef5
Change CP_ASYNC_CACHEGLOBAL to CP_ASYNC_CACHEGLOBAL_ZFILL for compat
|
hace 1 semana |
Tri Dao
|
b16d814c62
Revert to before Cutlass 3.6.0 update to investigate perf issue
|
hace 1 semana |
Tri Dao
|
2ba29df99e
Fix hanging when using AppendKV with persistent scheduler
|
hace 1 semana |
Tri Dao
|
8ec230f833
Fix to compile with Cutlass 3.6.0
|
hace 1 semana |
Tri Dao
|
64e6e0a09d
Switch to Cutlass 3.6.0 official release
|
hace 2 semanas |
Tri Dao
|
c93451d5f8
Fix causal using n_block_min instead of n_block_min_causal_local_mas
|
hace 2 semanas |
Tri Dao
|
6863fde13f
Fix bug in paged KV overshooting kBlockN in smem
|
hace 2 semanas |
Tri Dao
|
5171269dab
Implement forward pass for Sm80
|
hace 2 semanas |
Tri Dao
|
da264e5742
Change file names and class names to include sm90 suffix
|
hace 2 semanas |
Tri Dao
|
111ee9d478
Add back gemm_sm80 to utils, make copy work with has_with_bool
|
hace 2 semanas |
Tri Dao
|
5f25b9781f
Make epilogue_fwd work for Ampere
|
hace 2 semanas |
Tri Dao
|
69bd392159
Merge bwd and bwd_varlen in the C++ API
|
hace 2 semanas |
Tri Dao
|
c3cdc0fd88
Add sm_margin as an option for overlapping with communication
|
hace 2 semanas |
Tri Dao
|
3f85126149
Use persistent scheduler when paged_kv
|
hace 2 semanas |
Tri Dao
|
147ac33a2e
Tune num_splits for local, don't split when num_n_blocks is small
|
hace 2 semanas |
Tri Dao
|
448ac57039
Try persistent scheduler with backward
|
hace 3 semanas |
Tri Dao
|
0eb8f680a0
Fix env var to disable hdims
|
hace 3 semanas |
Tri Dao
|
7f5d73a162
Add env var to disable specific hdim
|
hace 3 semanas |
Tri Dao
|
3e5d77a102
Group instantiations for different hdims together
|
hace 3 semanas |
Tri Dao
|
234c557190
Fix kvcache test in the case with cu_seqlens_k_new
|
hace 3 semanas |