Tri Dao
|
2c996ca25f
Use SeqlenInfo for bwd and epilogue
|
1 month ago |
Tri Dao
|
e8a1edbeb2
Clean up some #include
|
1 month ago |
Tri Dao
|
82dc825759
Don't use the unsafe convert_type function
|
1 month ago |
Tri Dao
|
a4d41d2605
Fix epilogue compilation
|
1 month ago |
Tri Dao
|
95ba9e51e5
Simplify epilogue when split by using thread_mma.partition_C
|
1 month ago |
Tri Dao
|
e7b93e3902
Clean up mha_combine kernel
|
1 month ago |
Tri Dao
|
3d0d147940
Early stop on actual num_splits in mha_combine kernel
|
1 month ago |
Tri Dao
|
fe412d6b36
Redo rotary when contiguous
|
1 month ago |
Tri Dao
|
c5ba47b3d5
Add fence.async to epilogue
|
2 months ago |
Tri Dao
|
2e4eabd082
Move barrier_O arrive from mainloop to epilogue to simplify
|
2 months ago |
Tri Dao
|
82c1aa3514
Move PackGQA epilogue code to pack_gqa.h
|
2 months ago |
Tri Dao
|
df96486c31
Decode: varlen, paged KV, leftpad
|
2 months ago |
Tri Dao
|
6e8b25e426
Refactor
|
3 months ago |
Ying Zhang
|
a3a257c71d
Fix out-of-bound writes for var-seq-len zero-length KVs
|
5 months ago |
Ying Zhang
|
3669b25206
bwd benchmark + small fixes (#1129)
|
5 months ago |
jayhshah
|
5018ac6ac5
Fp8 kernel with "in-kernel" transpose of V in producer (#1100)
|
5 months ago |
Tri Dao
|
3aae9c18c1
Revert "Changes For FP8 (#1075)"
|
5 months ago |
ganeshcolfax
|
1899c970c8
Changes For FP8 (#1075)
|
5 months ago |
Ying Zhang
|
dfe1a59e4b
Add var-seq-len to FA3 fp16 / bf16 fwd (#1072)
|
5 months ago |
Tri Dao
|
74b0761ff7
[FA3] BF16 forward
|
6 months ago |
Tri Dao
|
7f67966cc7
FA3 initial code release
|
6 months ago |