david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ uneven-head-size-flashinfer

AlpinDale 3392b81bf9 sampler: allow parsing sampler order using strings (#858)		1 month ago
..
rpc	0256ed236b feat: windows support (#790)	2 months ago
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 year ago
api_server.py	2fa112f86b feat: update to serviceinfo v0.2 (#808)	2 months ago
args.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
logits_processors.py	62111fab17 feat: allow serving encoder-decoder models in the API server (#664)	4 months ago
protocol.py	3392b81bf9 sampler: allow parsing sampler order using strings (#858)	1 month ago
run_batch.py	81fa31bcaf feat: embeddings support for batched OAI endpoint (#676)	4 months ago
samplers.json	ac82b67f75 feat: naive context shift and various QoL changes (#289)	10 months ago
serving_chat.py	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	3 months ago
serving_completions.py	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	3 months ago
serving_embedding.py	ebf01d665b fix: disable embeddings API for chat models (#710)	4 months ago
serving_engine.py	1d3a1fec47 feat: add load/unload endpoints for soft-prompts (#694)	4 months ago
serving_tokenization.py	3648170750 fix: gracefully handle missing chat template (#642)	4 months ago