Jay Shah
|
b3d60fa3a5
prune more dead code
|
há 2 meses atrás |
Jay Shah
|
c06cc0ba9f
change cu_seqlens_k to seqused_k for kv cache api
|
há 2 meses atrás |
Jay Shah
|
1ecf821207
remove constexpr checks for actual seqlen in mainloop
|
há 2 meses atrás |
Jay Shah
|
35f3542442
refactor names
|
há 2 meses atrás |
Jay Shah
|
b0f067efdc
revert epi change for fp8 due to measured perf regression
|
há 2 meses atrás |
Jay Shah
|
eb9c0ee22a
add rmem -> gmem for fp8
|
há 2 meses atrás |
Jay Shah
|
d437d3dd5c
remove smem usage for when rmem -> gmem epilogue is used
|
há 2 meses atrás |
Jay Shah
|
785d978165
fix bug with fp8 q layout
|
há 2 meses atrás |
Jay Shah
|
930c8cad98
reorg mma code for less redundancy
|
há 2 meses atrás |
Jay Shah
|
6bb109238f
change seq len class per discussion
|
há 2 meses atrás |
Jay Shah
|
16eb1e53fd
remove deprecated fp8 code
|
há 2 meses atrás |
Jay Shah
|
64a0a91fe9
enable Is_local with fp8
|
há 2 meses atrás |
Jay Shah
|
81d402463e
prune unused code
|
há 2 meses atrás |
Jay Shah
|
be481cac27
add Is_local back in
|
há 2 meses atrás |
Jay Shah
|
6111666130
consolidate nblock min max methods
|
há 2 meses atrás |
Jay Shah
|
b5cac6d586
rebase with Is_local disabled temporarily
|
há 2 meses atrás |
Jay Shah
|
2472e5e0b4
add 'in principle' fp8 kv cache support
|
há 2 meses atrás |
Jay Shah
|
e36e004cb3
change fp8 code path to allow for split kernel and kv cache without perf regression
|
há 2 meses atrás |
Jay Shah
|
f3e5bd47cb
update parameters
|
há 2 meses atrás |
Jay Shah
|
23bf5b0fd6
fix bug with finalize for split kv
|
há 3 meses atrás |
Jay Shah
|
68ff3f7bee
add 1 mma warpgroup option, enable splitkv for hdim 256
|
há 3 meses atrás |
Jay Shah
|
1135dbd0ab
re-enable fp16/bf16 fwd
|
há 3 meses atrás |
Jay Shah
|
0a1a0c22b6
refactor for split kv
|
há 3 meses atrás |
Jay Shah
|
64a9cfb0fe
add causal logic
|
há 3 meses atrás |
Jay Shah
|
5704a1f424
fix some errors
|
há 3 meses atrás |
Jay Shah
|
535b8279bb
complete gqa parallel changes for non-causal
|
há 3 meses atrás |
Jay Shah
|
d2f049c8bc
change logic of Q loads for gqa parallelization
|
há 3 meses atrás |
Jay Shah
|
fc8f704f28
decouple types of seqlen traits q and k
|
há 3 meses atrás |
Jay Shah
|
74f160ba43
change template parameter for SeqLenTraits for ease of further extension
|
há 3 meses atrás |
Jay Shah
|
be0e36ddbf
enable use of actual seqlen for kv cache
|
há 4 meses atrás |