david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ get_last_latency_hack

AlpinDale a3c03db735 fix: inline model loading conflicts with lora (#930)		hai 1 mes
..
rpc	53d0ba7c7c api: add endpoint for loading and unloading the model (#926)	hai 1 mes
__init__.py	07aa2a492f upstream: add option to specify tokenizer	hai 1 ano
api_server.py	a3c03db735 fix: inline model loading conflicts with lora (#930)	hai 1 mes
args.py	d46e70ac98 api: add inline model loading (#928)	hai 1 mes
logits_processors.py	62111fab17 feat: allow serving encoder-decoder models in the API server (#664)	hai 4 meses
protocol.py	f61acdd3ec api: add json_schema to OpenAI server (#915)	hai 1 mes
run_batch.py	81fa31bcaf feat: embeddings support for batched OAI endpoint (#676)	hai 4 meses
samplers.json	ac82b67f75 feat: naive context shift and various QoL changes (#289)	hai 10 meses
serving_chat.py	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	hai 4 meses
serving_completions.py	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	hai 4 meses
serving_embedding.py	0c162c8dad api: use fp32 for base64 embeddings (#919)	hai 1 mes
serving_engine.py	1d3a1fec47 feat: add load/unload endpoints for soft-prompts (#694)	hai 4 meses
serving_tokenization.py	3648170750 fix: gracefully handle missing chat template (#642)	hai 4 meses