Tri Dao
|
40e534a7f6
Implement cache_leftpad
|
hace 5 meses |
Tri Dao
|
dca6d89da4
Don't support softcap and dropout at the same time
|
hace 5 meses |
Tri Dao
|
908511b2b6
Split into more .cu files to speed up compilation
|
hace 5 meses |
Tri Dao
|
1d536d7de5
Minor cleanup of softcapping
|
hace 5 meses |
Nicolas Patry
|
8f873cc6ac
Implement softcapping. (#1025)
|
hace 5 meses |
Nicolas Patry
|
5bf201966a
Fixing argument checking when using `seqlenq_ngroups_swapped`. (#976)
|
hace 5 meses |
Grigory Sizov
|
f816dee63c
Support unpadded LSE layout (#970)
|
hace 5 meses |
Tri Dao
|
9eb3d099c1
Transpose out when swapping seqlen_q and num_groups
|
hace 8 meses |
Driss Guessous
|
4a73e903da
Add in, macrosf for defining __grid_constant__ (#852)
|
hace 9 meses |
Grigory Sizov
|
2a15840f09
Enable paged attention in varlen forward (#831)
|
hace 9 meses |
Tri Dao
|
2406f28805
Enable headdim 256 backward on consumer GPUs (Ampere, Ada)
|
hace 9 meses |
Tri Dao
|
d9a5cb291c
Fix dv = torch::empty_like(k) for mha_bwd_varlen as well
|
hace 10 meses |
Brian Hirsh
|
2423cca3ad
fix backward for when query and key have different contiguity (#818)
|
hace 10 meses |
Grigory Sizov
|
4687936413
Fix Windows build (#816)
|
hace 10 meses |
Jeremy Reizenstein
|
0658e320f6
Preprocessor switches to control functionality (#788)
|
hace 10 meses |
Tri Dao
|
54e80a3829
Implement page KV cache
|
hace 10 meses |
Tri Dao
|
ea8a25ca38
Remove configure in bwd kernel launch
|
hace 10 meses |
Grigory Sizov
|
af01244ddd
Add split-kv and M<->H swap to varlen forward decoding attention (#754)
|
hace 10 meses |
Tri Dao
|
0842ec0da4
Don't dispatch to local if window size >= seqlen_k
|
hace 11 meses |
Tri Dao
|
732654583c
Implement deterministic backward (thanks to Meituan)
|
hace 11 meses |
Tri Dao
|
5ab9b3667b
Clean up alibi, implement non-causal alibi
|
hace 1 año |
Sanghun Cho
|
e4f726fc44
Support alibi, by Sanghun Cho from Kakao Brain
|
hace 1 año |
Jeremy Reizenstein
|
ce3e7280f8
Allow varlen_fwd to take optional seqused_k (#647)
|
hace 1 año |
Tri Dao
|
db2f80692c
Write zero to out / grad if seqlen_q or seqlen_k is zero
|
hace 1 año |
Tri Dao
|
e279bf8ed9
[Gen] Accept cache_batch_idx to index into the KV cache
|
hace 1 año |
Tri Dao
|
083e8f525f
Implement local attention
|
hace 1 año |
Tri Dao
|
65c234ed90
Don't over-allocate dq_accum in case of varlen
|
hace 1 año |
Tri Dao
|
2d8ea9a530
Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)
|
hace 1 año |
Tri Dao
|
3250ff3d82
Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H)
|
hace 1 año |
Tri Dao
|
ccbb14f38e
Implement rotary embedding in flash_attn_with_kvcache
|
hace 1 año |