david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 65cd99ba897c354618ce8947b4aaa1022b1091c4

AlpinDale 65cd99ba89 fix KVCache type		8 meses atrás
..
attention	1270b5567e triton compile error for flash_attn	9 meses atrás
common	6c43e00e60 add jamba modeling code	8 meses atrás
distributed	b1caee23a6 cache the p2p access check for memory saving	9 meses atrás
endpoints	b1caee23a6 cache the p2p access check for memory saving	9 meses atrás
engine	a1f18f17e6 modify the cache engine and model runner/worker to support mamba states	8 meses atrás
executor	a1f18f17e6 modify the cache engine and model runner/worker to support mamba states	8 meses atrás
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	10 meses atrás
lora	fe17712f29 fully working chunked prefill	9 meses atrás
modeling	65cd99ba89 fix KVCache type	8 meses atrás
processing	fe17712f29 fully working chunked prefill	9 meses atrás
spec_decode	4d33ce60da feat: Triton flash attention backend for ROCm (#407)	9 meses atrás
task_handler	a1f18f17e6 modify the cache engine and model runner/worker to support mamba states	8 meses atrás
transformers_utils	4fbb052b34 add jamba config file	8 meses atrás
__init__.py	c2aaaefd57 allow out-of-tree model registry	9 meses atrás
py.typed	1c988a48b2 fix logging and add py.typed	1 ano atrás