.. |
attention
|
1270b5567e
triton compile error for flash_attn
|
9 meses atrás |
common
|
6c43e00e60
add jamba modeling code
|
8 meses atrás |
distributed
|
b1caee23a6
cache the p2p access check for memory saving
|
9 meses atrás |
endpoints
|
b1caee23a6
cache the p2p access check for memory saving
|
9 meses atrás |
engine
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 meses atrás |
executor
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 meses atrás |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
10 meses atrás |
lora
|
fe17712f29
fully working chunked prefill
|
9 meses atrás |
modeling
|
65cd99ba89
fix KVCache type
|
8 meses atrás |
processing
|
fe17712f29
fully working chunked prefill
|
9 meses atrás |
spec_decode
|
4d33ce60da
feat: Triton flash attention backend for ROCm (#407)
|
9 meses atrás |
task_handler
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 meses atrás |
transformers_utils
|
4fbb052b34
add jamba config file
|
8 meses atrás |
__init__.py
|
c2aaaefd57
allow out-of-tree model registry
|
9 meses atrás |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 ano atrás |