Jay Shah
|
c06cc0ba9f
change cu_seqlens_k to seqused_k for kv cache api
|
hace 2 meses |
Jay Shah
|
7c1473e0e5
remove Is_batch_dynamic from seqlen traits and handle fp8 perf regression using smem boolean
|
hace 2 meses |
Jay Shah
|
35f3542442
refactor names
|
hace 2 meses |
Jay Shah
|
6bb109238f
change seq len class per discussion
|
hace 2 meses |
Jay Shah
|
e36e004cb3
change fp8 code path to allow for split kernel and kv cache without perf regression
|
hace 2 meses |
Jay Shah
|
78736b4336
remove unused code
|
hace 2 meses |
Jay Shah
|
0a1a0c22b6
refactor for split kv
|
hace 3 meses |
Jay Shah
|
64a9cfb0fe
add causal logic
|
hace 3 meses |
Jay Shah
|
d2f049c8bc
change logic of Q loads for gqa parallelization
|
hace 3 meses |
Jay Shah
|
13bad551cb
modify seqlentraits for gqa parallelism
|
hace 3 meses |
Jay Shah
|
74f160ba43
change template parameter for SeqLenTraits for ease of further extension
|
hace 4 meses |
Jay Shah
|
7ee8ee48b2
start extending seqlen traits for kv cache
|
hace 4 meses |
Ying Zhang
|
dfe1a59e4b
Add var-seq-len to FA3 fp16 / bf16 fwd (#1072)
|
hace 4 meses |