david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ video

AlpinDale 22a4cd4595 core: fix spec decode metrics and envs circular import (#889)		il y a 3 semaines
..
rpc	b5aa11020b api: fix crashes under very high loads (#878)	il y a 4 semaines
__init__.py	07aa2a492f upstream: add option to specify tokenizer	il y a 1 an
api_server.py	22a4cd4595 core: fix spec decode metrics and envs circular import (#889)	il y a 3 semaines
args.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
logits_processors.py	62111fab17 feat: allow serving encoder-decoder models in the API server (#664)	il y a 4 mois
protocol.py	3392b81bf9 sampler: allow parsing sampler order using strings (#858)	il y a 1 mois
run_batch.py	81fa31bcaf feat: embeddings support for batched OAI endpoint (#676)	il y a 4 mois
samplers.json	ac82b67f75 feat: naive context shift and various QoL changes (#289)	il y a 10 mois
serving_chat.py	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	il y a 4 mois
serving_completions.py	61c7182491 feat: enable prompt logprobs in OpenAI API (#720)	il y a 4 mois
serving_embedding.py	ebf01d665b fix: disable embeddings API for chat models (#710)	il y a 4 mois
serving_engine.py	1d3a1fec47 feat: add load/unload endpoints for soft-prompts (#694)	il y a 4 mois
serving_tokenization.py	3648170750 fix: gracefully handle missing chat template (#642)	il y a 4 mois