Tri Dao
|
d5893f3c74
Merge branch 'main' into changes_for_fp8
|
4 ماه پیش |
Tri Dao
|
59594f2a67
Bump to v2.6.2
|
4 ماه پیش |
Tri Dao
|
299563626f
Fix test with alibi and cache_leftpad
|
4 ماه پیش |
Tri Dao
|
4488acee8d
[CI] Compile with torch 2.4.0.dev20240527
|
4 ماه پیش |
Tri Dao
|
65f723bb9a
Split bwd into more .cu files to speed up compilation
|
4 ماه پیش |
Tri Dao
|
5ca83a9c71
Clean up softcapping bwd a bit
|
4 ماه پیش |
Tri Dao
|
751c762c9c
Don't specialize for hdim 224 to speed up compilation
|
4 ماه پیش |
Driss Guessous
|
1c275eb070
Fix ima for split-kv kernel (#1085)
|
4 ماه پیش |
janEbert
|
3c4053b75c
Make FA3 externally importable (#1053)
|
4 ماه پیش |
rocking
|
d8f104e97a
Support AMD ROCm on FlashAttention 2 (#1010)
|
4 ماه پیش |
Ying Zhang
|
dfe1a59e4b
Add var-seq-len to FA3 fp16 / bf16 fwd (#1072)
|
4 ماه پیش |
Cameron Shinn
|
cb516f855b
Remove torchlib dependency from cpp files (#1083)
|
4 ماه پیش |
Phil Wang
|
5f1ae4a34b
backwards for softcapping (#1033)
|
4 ماه پیش |
youkaichao
|
ef3e358a25
remove lambda (#1056)
|
4 ماه پیش |
Jorge António
|
4df62e1440
catch typo (#1058)
|
4 ماه پیش |
Ganesh Bikshandi
|
81b379c54d
minor reformatting.
|
4 ماه پیش |
Ganesh Bikshandi
|
e0607bb3aa
minor formatting.
|
4 ماه پیش |
Ganesh Bikshandi
|
df66e974bc
fixed odd-seq-len-k.
|
4 ماه پیش |
Ganesh Bikshandi
|
63e9277199
change to correct tile size for headdim=128.
|
4 ماه پیش |
Ganesh Bikshandi
|
fe4c5b59df
undid clang formatting.
|
4 ماه پیش |
Ganesh Bikshandi
|
4eacd5886d
enable all tests except odd-seq-lengths, where it crashes now.
|
4 ماه پیش |
Ganesh Bikshandi
|
d5c2d1aa18
removed contiguous check.
|
4 ماه پیش |
Ganesh Bikshandi
|
cdc966e81a
adding files for fp8 changes.
|
4 ماه پیش |
Tri Dao
|
74b0761ff7
[FA3] BF16 forward
|
5 ماه پیش |
Tri Dao
|
898dd4bbf2
Pass seqused_k to _flash_attn_varlen_forward
|
5 ماه پیش |
Tri Dao
|
7ef24848cf
Add FA3 image
|
5 ماه پیش |
Tri Dao
|
7f67966cc7
FA3 initial code release
|
5 ماه پیش |
Tri Dao
|
b4a9dd6c9c
Temporarily switch to cutlass fork for more shapes
|
5 ماه پیش |
Tri Dao
|
7551202cb2
Bump to v2.6.1
|
5 ماه پیش |
Tri Dao
|
844912dca0
[CI] Switch from CUDA 12.2 to 12.3
|
5 ماه پیش |