Jay Shah
|
9b6cba16c1
remove some debug code
|
2 ヶ月 前 |
Jay Shah
|
dec7dee1b1
fix integer sign compare warning
|
2 ヶ月 前 |
Jay Shah
|
50cb90aea6
comment out unimplemented kwargs from flash_attn_with_kvcache
|
2 ヶ月 前 |
Jay Shah
|
b3d60fa3a5
prune more dead code
|
2 ヶ月 前 |
Jay Shah
|
8efb953eeb
remove commented out code
|
2 ヶ月 前 |
Jay Shah
|
a7cce59d25
adjust tolerances in test script for kv cache
|
2 ヶ月 前 |
Jay Shah
|
c06cc0ba9f
change cu_seqlens_k to seqused_k for kv cache api
|
2 ヶ月 前 |
Jay Shah
|
7c1473e0e5
remove Is_batch_dynamic from seqlen traits and handle fp8 perf regression using smem boolean
|
2 ヶ月 前 |
Jay Shah
|
1ecf821207
remove constexpr checks for actual seqlen in mainloop
|
2 ヶ月 前 |
Jay Shah
|
8374e1fa78
remove test code
|
2 ヶ月 前 |
Jay Shah
|
35f3542442
refactor names
|
2 ヶ月 前 |
Jay Shah
|
b0f067efdc
revert epi change for fp8 due to measured perf regression
|
2 ヶ月 前 |
Jay Shah
|
eb9c0ee22a
add rmem -> gmem for fp8
|
2 ヶ月 前 |
Jay Shah
|
551b91f4c9
uniform notation
|
2 ヶ月 前 |
Jay Shah
|
7169b23399
unify rmem -> gmem methods
|
2 ヶ月 前 |
Jay Shah
|
ab5d336e61
better writeout logic with vectorization
|
2 ヶ月 前 |
Jay Shah
|
d437d3dd5c
remove smem usage for when rmem -> gmem epilogue is used
|
2 ヶ月 前 |
Ganesh Bikshandi
|
e49cb5f77c
passes except for hdim=256.
|
2 ヶ月 前 |
Ganesh Bikshandi
|
dc2c952f37
compiles and builes. Not validates.
|
2 ヶ月 前 |
Ganesh Bikshandi
|
a075e769fb
handle gqa_parallel with rmem-to-gmem. Not validating yet.
|
2 ヶ月 前 |
Jay Shah
|
4a4dbd29c5
move IsRegToGmem
|
2 ヶ月 前 |
Jay Shah
|
8f45a8cfa2
tests passing now for non-gqa impl
|
2 ヶ月 前 |
Ganesh Bikshandi
|
f0b49460ec
changes to use tiledcopy (still not passing).
|
2 ヶ月 前 |
Ganesh Bikshandi
|
8fbefa8ac4
adding rmem to gmem. (Not validating yet).
|
2 ヶ月 前 |
Jay Shah
|
785d978165
fix bug with fp8 q layout
|
2 ヶ月 前 |
Jay Shah
|
aa0e699412
move descale tensor declarations outside of conditional
|
2 ヶ月 前 |
Jay Shah
|
fff4b5c09b
add split kv benchmark script
|
2 ヶ月 前 |
Jay Shah
|
bc4b8722f6
add crude hdim 64 heuristic
|
2 ヶ月 前 |
Jay Shah
|
930c8cad98
reorg mma code for less redundancy
|
2 ヶ月 前 |
Jay Shah
|
03200a753f
removed old gqa cu files and unified methods
|
2 ヶ月 前 |