david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ feat/tree-attention

AlpinDale c577c31aaa feat: tree attention		8 months ago
..
attention	c577c31aaa feat: tree attention	8 months ago
common	c577c31aaa feat: tree attention	8 months ago
distributed	b1caee23a6 cache the p2p access check for memory saving	8 months ago
endpoints	c577c31aaa feat: tree attention	8 months ago
engine	c577c31aaa feat: tree attention	8 months ago
executor	60ca1e1e5e feat: add ngram prompt lookup decoding for speculative decoding (#438)	8 months ago
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	9 months ago
lora	fe17712f29 fully working chunked prefill	8 months ago
modeling	b28011e86e fix: shard exl2 weights more evenly between ranks (#437)	8 months ago
processing	c577c31aaa feat: tree attention	8 months ago
spec_decode	60ca1e1e5e feat: add ngram prompt lookup decoding for speculative decoding (#438)	8 months ago
task_handler	c577c31aaa feat: tree attention	8 months ago
transformers_utils	58b0616dd3 feat: support sharded ggufs (#420)	8 months ago
__init__.py	c2aaaefd57 allow out-of-tree model registry	8 months ago
py.typed	1c988a48b2 fix logging and add py.typed	1 year ago