david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ feat/perplexity

AlpinDale 968bde81bf fix: tensor parallel with GPTQ and AWQ quants (#307)		10 mēneši atpakaļ
..
quantization	c41462cfcd feat: exllamav2 quantization (#305)	10 mēneši atpakaļ
triton_kernel	16615784b3 fix: prefix cache for turing gpus	10 mēneši atpakaļ
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 gadu atpakaļ
activation.py	e31c6f0b45 feat: refactor modeling logic and support more models (#274)	10 mēneši atpakaļ
attention.py	9810daa699 feat: INT8 KV Cache (#298)	10 mēneši atpakaļ
layernorm.py	e31c6f0b45 feat: refactor modeling logic and support more models (#274)	10 mēneši atpakaļ
linear.py	c2d77b1822 chore: logging refactor (#302)	10 mēneši atpakaļ
rejection.py	95bdd35ec9 feat: rejection sampler (#197)	1 gadu atpakaļ
rotary_embedding.py	ea0f57b233 feat: allow further support for non-cuda devices (#247)	11 mēneši atpakaļ
sampler.py	9fa99215f8 feat: add cubic sampling (#280)	10 mēneši atpakaļ
vocab_parallel_embedding.py	968bde81bf fix: tensor parallel with GPTQ and AWQ quants (#307)	10 mēneši atpakaļ