Phil Wang
|
5f1ae4a34b
backwards for softcapping (#1033)
|
6 months ago |
Tri Dao
|
40e534a7f6
Implement cache_leftpad
|
6 months ago |
Tri Dao
|
1d536d7de5
Minor cleanup of softcapping
|
6 months ago |
Nicolas Patry
|
8f873cc6ac
Implement softcapping. (#1025)
|
6 months ago |
Liang
|
ab59ec3590
remove swizzle part of `sV.data()` to get a completely non-swizzle `sVtNoSwizzle` (#984)
|
6 months ago |
Grigory Sizov
|
f816dee63c
Support unpadded LSE layout (#970)
|
6 months ago |
Tri Dao
|
d732be1e67
Update to Cutlass 3.5
|
7 months ago |
Tri Dao
|
656daef4ea
Use Cute's local_tile to get gQ, gK, gV
|
9 months ago |
ljss
|
3e9414f1c3
Minor fix in compute_attn_1rowblock_splitkv (#900)
|
9 months ago |
Tri Dao
|
54e80a3829
Implement page KV cache
|
1 year ago |
Tri Dao
|
8f4d82cf5e
Update cutlass to v3.4.0
|
1 year ago |
Tri Dao
|
395e5a0dba
Move rotary device functions to a separate file
|
1 year ago |
Tri Dao
|
66a127aef8
Refactor masking in fwd pass into 1 object
|
1 year ago |
Tri Dao
|
6f706eff96
Make Softmax an object
|
1 year ago |
Tri Dao
|
4ea866ca19
Make Alibi an object
|
1 year ago |
Tri Dao
|
df1418f9db
Move softmax_rescale_o to softmax.h
|
1 year ago |
Tri Dao
|
6777336a1c
Move masking to a separate file (mask.h)
|
1 year ago |
Tri Dao
|
1274ec3e7e
Move dropout to a separate file (dropout.h)
|
1 year ago |
Tri Dao
|
10dad61277
apply_dropout now takes tensor of rowcol layout
|
1 year ago |
Tri Dao
|
a7b66ae25a
Simplify writing softmax to gmem
|
1 year ago |
Tri Dao
|
8d1b169ed1
Simplify SmemLayoutVtransposed in kernel_traits.h
|
1 year ago |
Tri Dao
|
5ab9b3667b
Clean up alibi, implement non-causal alibi
|
1 year ago |
Sanghun Cho
|
e4f726fc44
Support alibi, by Sanghun Cho from Kakao Brain
|
1 year ago |
Tri Dao
|
db2f80692c
Write zero to out / grad if seqlen_q or seqlen_k is zero
|
1 year ago |
Tri Dao
|
e279bf8ed9
[Gen] Accept cache_batch_idx to index into the KV cache
|
1 year ago |
Tri Dao
|
083e8f525f
Implement local attention
|
1 year ago |
Tri Dao
|
2d8ea9a530
Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)
|
1 year ago |
Tri Dao
|
ccbb14f38e
Implement rotary embedding in flash_attn_with_kvcache
|
1 year ago |
Tri Dao
|
56b7fc6ee0
Simplify the implementation of KVcache attn by appending KV first
|
1 year ago |
Tri Dao
|
bb9beb3645
Remove some unused headers
|
1 year ago |