david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ mistral_skip_special_tokens

AlpinDale 1264e0b5d8 api: add mistral function calling format to all models loaded with "mistral" format (#1053)		3 semanas atrás
..
rpc	638c08d9dc fix: clean shutdown issues (#1047)	4 semanas atrás
tool_parsers	a56bce4c94 fix: remove duplicate assignment in Hermes2ProToolParser	4 semanas atrás
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 ano atrás
api_server.py	638c08d9dc fix: clean shutdown issues (#1047)	4 semanas atrás
args.py	313e198557 api: implement OpenAI-compatible tools API for Hermes/Mistral models (#993)	1 mês atrás
logits_processors.py	62111fab17 feat: allow serving encoder-decoder models in the API server (#664)	4 meses atrás
protocol.py	055c8905a3 api: add sampling/engine option to return only deltas or final output (#1035)	4 semanas atrás
run_batch.py	81fa31bcaf feat: embeddings support for batched OAI endpoint (#676)	4 meses atrás
samplers.json	ac82b67f75 feat: naive context shift and various QoL changes (#289)	10 meses atrás
serving_chat.py	1264e0b5d8 api: add mistral function calling format to all models loaded with "mistral" format (#1053)	3 semanas atrás
serving_completions.py	055c8905a3 api: add sampling/engine option to return only deltas or final output (#1035)	4 semanas atrás
serving_embedding.py	0c162c8dad api: use fp32 for base64 embeddings (#919)	1 mês atrás
serving_engine.py	c5c09720b0 api: log prompt truncation (#940)	1 mês atrás
serving_tokenization.py	055c8905a3 api: add sampling/engine option to return only deltas or final output (#1035)	4 semanas atrás