Tri Dao
|
5acb532214
Switch to cutlass v3.6.0, fix perf regression for hdim 128 causal
|
2 weeks ago |
Tri Dao
|
65a0f59ef5
Change CP_ASYNC_CACHEGLOBAL to CP_ASYNC_CACHEGLOBAL_ZFILL for compat
|
4 weeks ago |
Tri Dao
|
8ec230f833
Fix to compile with Cutlass 3.6.0
|
4 weeks ago |
Tri Dao
|
6863fde13f
Fix bug in paged KV overshooting kBlockN in smem
|
1 month ago |
Tri Dao
|
94657af3e8
Add option for not doing intra-WG overlapping of gemm and softmax
|
2 months ago |
Tri Dao
|
fe412d6b36
Redo rotary when contiguous
|
2 months ago |
Tri Dao
|
b2d3fe92ff
Move rotary to a separate file
|
2 months ago |
Tri Dao
|
4d00645c76
Implement appending new KV to KV cache
|
2 months ago |
Tri Dao
|
d00b88ee05
Move PagedKV to a separate file
|
2 months ago |