Tri Dao
|
6807b1ea37
Longest-processing-time-first scheduler for causal
|
1 day ago |
Tri Dao
|
fb9c9cbbe9
Support qkv_descale of shape (batch_size, nheads_kv)
|
3 days ago |
Tri Dao
|
6293008748
Add option for Mma0_is_RS and Mma1_is_RS in attn fwd
|
1 week ago |
Tri Dao
|
f5e89ff136
Tune tile size for bwd softcap
|
1 week ago |
Tri Dao
|
29cdfedd80
Use Bulk reduce instead of TMA for dQaccum, split across WGs
|
1 week ago |
Tri Dao
|
9c954f7021
Use num_split_heuristics in fwd and fwd_varlen
|
1 week ago |
Tri Dao
|
f6e165becf
Change tile_size and local to avoid wgmma being serialized
|
2 weeks ago |
Tri Dao
|
3ed79742fb
Add option to shuffle LSE and dPsum in the bwd
|
2 weeks ago |
Tri Dao
|
42fc4962f0
Uncomment tanh softcapping
|
2 weeks ago |
Tri Dao
|
9553b2728f
More env vars to disable features
|
3 weeks ago |
Tri Dao
|
3248babb9e
QOL: Use env var to selectively disable features
|
3 weeks ago |
Tri Dao
|
94657af3e8
Add option for not doing intra-WG overlapping of gemm and softmax
|
3 weeks ago |
Tri Dao
|
fc2fd95a18
Renable FP8 kernels
|
3 weeks ago |
Tri Dao
|
3d0d147940
Early stop on actual num_splits in mha_combine kernel
|
3 weeks ago |
Tri Dao
|
9fd6b977bb
Precompute the pointers in mha_combine kernel
|
3 weeks ago |
Tri Dao
|
64d92bce53
Split PagedKV into separate .cu files to speed up compilation
|
3 weeks ago |
Tri Dao
|
586ba914bb
Move fwd tile size to a separate file
|
3 weeks ago |
Tri Dao
|
5194d9b2e6
Include torch/version.h for TORCH_VERSION* macros
|
3 weeks ago |
Tri Dao
|
25dbfa6452
Add heuristic for setting num_splits from FA2
|
3 weeks ago |
Tri Dao
|
018b9af683
Move .cu files to instantiations, use generate_kernels.py
|
3 weeks ago |
Tri Dao
|
0290574956
Put #if to avoid redefinition with torch >= 2.4
|
4 weeks ago |
Tri Dao
|
9f82a326ad
Implement rotary for attn decode
|
1 month ago |
Tri Dao
|
4d00645c76
Implement appending new KV to KV cache
|
1 month ago |
Tri Dao
|
4860b1068f
Fix mha_combine tests
|
1 month ago |
Tri Dao
|
a65af55f4a
Move mask_fn and load_Q into separate functions
|
1 month ago |
Tri Dao
|
df96486c31
Decode: varlen, paged KV, leftpad
|
1 month ago |
Tri Dao
|
ea7a98f15d
Fix backward with softcap
|
2 months ago |
Tri Dao
|
6e8b25e426
Refactor
|
2 months ago |
Ying Zhang
|
dff976a84a
fixes
|
3 months ago |
Ying Zhang
|
7b4e68e04f
hopper local attention
|
3 months ago |