Tri Dao
|
6807b1ea37
Longest-processing-time-first scheduler for causal
|
пре 1 месец |
Tri Dao
|
29cdfedd80
Use Bulk reduce instead of TMA for dQaccum, split across WGs
|
пре 1 месец |
Tri Dao
|
314b9edfc0
Don't need to link to cuda lib anymore
|
пре 1 месец |
Tri Dao
|
f11624b746
Disable --split-compile due to ptxas register allocation failure
|
пре 1 месец |
Tri Dao
|
e8a1edbeb2
Clean up some #include
|
пре 1 месец |
Tri Dao
|
8ae77ea17c
Download nvcc 12.3 to compile for best perf
|
пре 1 месец |
Tri Dao
|
42fc4962f0
Uncomment tanh softcapping
|
пре 1 месец |
Tri Dao
|
6bc55b571c
Use --split-compile to speed up compilation
|
пре 1 месец |
Tri Dao
|
9553b2728f
More env vars to disable features
|
пре 1 месец |
Tri Dao
|
3248babb9e
QOL: Use env var to selectively disable features
|
пре 1 месец |
Tri Dao
|
94657af3e8
Add option for not doing intra-WG overlapping of gemm and softmax
|
пре 1 месец |
Tri Dao
|
fc2fd95a18
Renable FP8 kernels
|
пре 1 месец |
Tri Dao
|
64d92bce53
Split PagedKV into separate .cu files to speed up compilation
|
пре 1 месец |
Tri Dao
|
018b9af683
Move .cu files to instantiations, use generate_kernels.py
|
пре 1 месец |
Tri Dao
|
82c1aa3514
Move PackGQA epilogue code to pack_gqa.h
|
пре 2 месеци |
Tri Dao
|
df96486c31
Decode: varlen, paged KV, leftpad
|
пре 2 месеци |
Tri Dao
|
6e8b25e426
Refactor
|
пре 3 месеци |
hlky
|
8476986721
Fix FAv3 compilation with MSVC (#1240)
|
пре 4 месеци |
Ying Zhang
|
dff976a84a
fixes
|
пре 4 месеци |
Tri Dao
|
bafe253042
[FA3] Bwd
|
пре 5 месеци |
jayhshah
|
5018ac6ac5
Fp8 kernel with "in-kernel" transpose of V in producer (#1100)
|
пре 5 месеци |
Tri Dao
|
3aae9c18c1
Revert "Changes For FP8 (#1075)"
|
пре 5 месеци |
ganeshcolfax
|
1899c970c8
Changes For FP8 (#1075)
|
пре 5 месеци |
janEbert
|
3c4053b75c
Make FA3 externally importable (#1053)
|
пре 5 месеци |
Tri Dao
|
74b0761ff7
[FA3] BF16 forward
|
пре 6 месеци |
Tri Dao
|
7f67966cc7
FA3 initial code release
|
пре 6 месеци |