Tri Dao
|
f907a13187
Tune tile sizes for fwd varlen on Sm80 and Sm86
|
4 hete |
Tri Dao
|
76f14c61c9
Tune fwd tile sizes for Sm86 and Sm89
|
4 hete |
Tri Dao
|
659a631f4c
Rename bwd classes to include Sm90 suffix
|
1 hónapja |
Tri Dao
|
a901c7eeda
Make Sm80 forward pass work with persistent scheduler
|
1 hónapja |
Tri Dao
|
5171269dab
Implement forward pass for Sm80
|
1 hónapja |
Tri Dao
|
da264e5742
Change file names and class names to include sm90 suffix
|
1 hónapja |
Tri Dao
|
5f25b9781f
Make epilogue_fwd work for Ampere
|
1 hónapja |
Tri Dao
|
69bd392159
Merge bwd and bwd_varlen in the C++ API
|
1 hónapja |
Tri Dao
|
c3cdc0fd88
Add sm_margin as an option for overlapping with communication
|
1 hónapja |
Tri Dao
|
147ac33a2e
Tune num_splits for local, don't split when num_n_blocks is small
|
1 hónapja |
Tri Dao
|
3e5d77a102
Group instantiations for different hdims together
|
1 hónapja |
Tri Dao
|
6807b1ea37
Longest-processing-time-first scheduler for causal
|
1 hónapja |
Tri Dao
|
fb9c9cbbe9
Support qkv_descale of shape (batch_size, nheads_kv)
|
1 hónapja |
Tri Dao
|
6293008748
Add option for Mma0_is_RS and Mma1_is_RS in attn fwd
|
1 hónapja |
Tri Dao
|
9c954f7021
Use num_split_heuristics in fwd and fwd_varlen
|
2 hónapja |
Tri Dao
|
f6e165becf
Change tile_size and local to avoid wgmma being serialized
|
2 hónapja |
Tri Dao
|
42fc4962f0
Uncomment tanh softcapping
|
2 hónapja |
Tri Dao
|
9553b2728f
More env vars to disable features
|
2 hónapja |
Tri Dao
|
3248babb9e
QOL: Use env var to selectively disable features
|
2 hónapja |
Tri Dao
|
c9c40eba83
Uncomment local attn
|
2 hónapja |
Tri Dao
|
94657af3e8
Add option for not doing intra-WG overlapping of gemm and softmax
|
2 hónapja |
Tri Dao
|
fc2fd95a18
Renable FP8 kernels
|
2 hónapja |
Tri Dao
|
64d92bce53
Split PagedKV into separate .cu files to speed up compilation
|
2 hónapja |
Tri Dao
|
bc8a001d8d
Load cos/sin by splitting the work among threads on the same row
|
2 hónapja |
Tri Dao
|
1dc3364774
Consolidate seqlen info into a struct
|
2 hónapja |
Tri Dao
|
586ba914bb
Move fwd tile size to a separate file
|
2 hónapja |
Tri Dao
|
018b9af683
Move .cu files to instantiations, use generate_kernels.py
|
2 hónapja |
Tri Dao
|
0c49ac9a07
Implement rotary non-interleaved
|
2 hónapja |
Tri Dao
|
b2d3fe92ff
Move rotary to a separate file
|
2 hónapja |
Tri Dao
|
9f82a326ad
Implement rotary for attn decode
|
2 hónapja |