Tri Dao
|
22c0358f4b
Fix nvcc_from_env not found
|
il y a 1 semaine |
Tri Dao
|
e94f7e89dc
Always enable PackGQA is Split to reduce compilation and binary size
|
il y a 1 semaine |
Tri Dao
|
40fa35acd8
Always enable PackGQA if PagedKV to reduce compilation and bin size
|
il y a 1 semaine |
Tri Dao
|
a84a237d2a
Split bwd softcap compilation units for Sm80
|
il y a 1 semaine |
Tri Dao
|
518e919a60
Fix softcap compilation
|
il y a 1 semaine |
Tri Dao
|
ea8cd7fe7b
Ungroup hdim and group softcap for Sm80 compilation
|
il y a 1 semaine |
Tri Dao
|
df5fe55264
Change tile sizes for Sm8x to reduce stack frame
|
il y a 1 semaine |
Tri Dao
|
8dd0b479d5
Always enable PackGQA for Sm8x to reduce compilation and binary size
|
il y a 1 semaine |
Tri Dao
|
9b0f19cc5c
Only use nvcc 12.3 for Sm90, use standard nvcc for Sm80
|
il y a 1 semaine |
Tri Dao
|
5f525322ec
Only pass sm_90a compile flag to Sm90 kernels, same w Sm89 kernels
|
il y a 1 semaine |
Tri Dao
|
180ff782dd
Template for Sm86
|
il y a 1 semaine |
Tri Dao
|
7bc3f031a4
Compile for both Sm80 and Sm90
|
il y a 1 semaine |
Tri Dao
|
7a802796e1
Big refactor and update
|
il y a 1 semaine |
jayhshah
|
a5a75274bc
FA3 kvcache + split kv + gqa parallelization (#1236)
|
il y a 3 mois |
hlky
|
8476986721
Fix FAv3 compilation with MSVC (#1240)
|
il y a 4 mois |
Ying Zhang
|
dff976a84a
fixes
|
il y a 4 mois |
Tri Dao
|
bafe253042
[FA3] Bwd
|
il y a 5 mois |
jayhshah
|
5018ac6ac5
Fp8 kernel with "in-kernel" transpose of V in producer (#1100)
|
il y a 5 mois |
Tri Dao
|
3aae9c18c1
Revert "Changes For FP8 (#1075)"
|
il y a 5 mois |
ganeshcolfax
|
1899c970c8
Changes For FP8 (#1075)
|
il y a 5 mois |
janEbert
|
3c4053b75c
Make FA3 externally importable (#1053)
|
il y a 6 mois |
Tri Dao
|
74b0761ff7
[FA3] BF16 forward
|
il y a 6 mois |
Tri Dao
|
7f67966cc7
FA3 initial code release
|
il y a 6 mois |