Tri Dao
|
ae3c1fb3e0
Simplify bwd by setting NumdQWarpGroups = NumMmaWarpGroups
|
3 weeks ago |
Tri Dao
|
2c996ca25f
Use SeqlenInfo for bwd and epilogue
|
4 weeks ago |
Tri Dao
|
3b6ac2b954
Use compile time constants in local mask
|
1 month ago |
Tri Dao
|
bfbaafd043
Fix bwd reading out of out LSE
|
1 month ago |
Tri Dao
|
29cdfedd80
Use Bulk reduce instead of TMA for dQaccum, split across WGs
|
1 month ago |
Tri Dao
|
e8a1edbeb2
Clean up some #include
|
1 month ago |
Tri Dao
|
3ed79742fb
Add option to shuffle LSE and dPsum in the bwd
|
1 month ago |
Tri Dao
|
82dc825759
Don't use the unsafe convert_type function
|
1 month ago |
Tri Dao
|
df96486c31
Decode: varlen, paged KV, leftpad
|
1 month ago |
Tri Dao
|
ea7a98f15d
Fix backward with softcap
|
2 months ago |
Tri Dao
|
6e8b25e426
Refactor
|
3 months ago |
Ying Zhang
|
1c9717d699
address comments
|
3 months ago |
Ying Zhang
|
dff976a84a
fixes
|
4 months ago |
Ying Zhang
|
7b4e68e04f
hopper local attention
|
4 months ago |
Ying Zhang
|
db80387343
Add seqused_q in fwd / bwd and seqused_k in bwd.
|
4 months ago |
Tri Dao
|
bafe253042
[FA3] Bwd
|
5 months ago |