david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ feat/perplexity

drummerv e59dd4a90d fix: openai gguf chat template (#312)		10 months ago
..
common	c41462cfcd feat: exllamav2 quantization (#305)	10 months ago
endpoints	e59dd4a90d fix: openai gguf chat template (#312)	10 months ago
engine	c41462cfcd feat: exllamav2 quantization (#305)	10 months ago
kv_quant	9810daa699 feat: INT8 KV Cache (#298)	10 months ago
lora	a1d8ab9f3e fix: lora on quantized models (barred gguf) (#292)	10 months ago
modeling	968bde81bf fix: tensor parallel with GPTQ and AWQ quants (#307)	10 months ago
processing	c2d77b1822 chore: logging refactor (#302)	10 months ago
task_handler	c2d77b1822 chore: logging refactor (#302)	10 months ago
transformers_utils	e59dd4a90d fix: openai gguf chat template (#312)	10 months ago
__init__.py	ff898c2c80 bump version to 0.5.0 (#303)	10 months ago
py.typed	1c988a48b2 fix logging and add py.typed	1 year ago