Tri Dao
|
f1a73d0740
Run isort and black on python files
|
1 gadu atpakaļ |
Tri Dao
|
5d079fdd7a
[Triton] Fix benchmark_causal, mention Triton version
|
1 gadu atpakaļ |
Tri Dao
|
6b5f271c6d
[Triton] Avoid einops repeat by using Tensor.expand
|
2 gadi atpakaļ |
Tri Dao
|
b8ccd20098
[Triton] Fix variable name from qkv to kv (h/t FrankZijlstra)
|
2 gadi atpakaļ |
Tri Dao
|
908a5b2244
Set num_warps=4 for headdim=64 in Triton fw (h/t Michael Benesty)
|
2 gadi atpakaļ |
Tri Dao
|
7479757191
Fix pipelining bug in Triton bwd with bias_type=matrix
|
2 gadi atpakaļ |
Tri Dao
|
557781933d
Parallelize CUDA bwd along seqlen_k instead of seqlen_q
|
2 gadi atpakaļ |
Tri Dao
|
62025e1aff
Fix more race condition in Triton bwd when there's bias
|
2 gadi atpakaļ |
Tri Dao
|
ff78ea4123
Fix race condition in Triton bwd when there's bias
|
2 gadi atpakaļ |
Tri Dao
|
86862cfd7b
Implement attention bias for Triton version
|
2 gadi atpakaļ |
Tri Dao
|
470010f59b
Fix race condition for Triton bwd for headdim 48 and 96
|
2 gadi atpakaļ |
Tri Dao
|
aacc10fbab
Fix race condition in Triton bwd for non-po2 headdims
|
2 gadi atpakaļ |
Tri Dao
|
1fb12afdfb
Avoid memcpy in the Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
731f154de3
Fix race conditions in the Triton bwd for headdim=64
|
2 gadi atpakaļ |
Tri Dao
|
9b0bc97872
Fix race condition in Triton fwd
|
2 gadi atpakaļ |
Tri Dao
|
215930bce3
Fix EVEN_M & EVEN_HEADDIM for headdim=40 in Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
4f81aff46e
Add debug_barrier for all headdims in Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
bedcbd6a71
Disable some autotune configs that give wrong results in Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
e78d509c64
[WIP] Support all head dimensions up to 128 in the Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
008951f1d9
Support all head dimensions up to 128 in the Triton fwd
|
2 gadi atpakaļ |
Tri Dao
|
b910bf14c1
Support arbitrary seqlens (both q & k) in Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
dc55469355
Support arbitrary seqlen_k in Triton bwd
|
2 gadi atpakaļ |
Tri Dao
|
d11341fd1a
Fix Triton fwd to support seqlen not multiples of 128
|
2 gadi atpakaļ |
Tri Dao
|
b0c0db81f6
Implement FlashAttention in Triton
|
2 gadi atpakaļ |