david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ c577c31aaa69eb5b4768b491b54dc407a7d2bbe4

AlpinDale 8d26cf3876 simplify model_executor logic		9 months ago
..
__init__.py	bd0ddf1cfe feat: EETQ quantization (#408)	9 months ago
aqlm.py	41beab5dc1 add exllamav2 tensor paralell, fused MoE for GPTQ/AWQ	10 months ago
awq.py	41beab5dc1 add exllamav2 tensor paralell, fused MoE for GPTQ/AWQ	10 months ago
base_config.py	aa244761ed formatting and typing	10 months ago
bitsandbytes.py	fa083286e3 Speculative Decoding Part 4: Lookahead scheduling (#402)	9 months ago
eetq.py	8d26cf3876 simplify model_executor logic	9 months ago
exl2.py	ea26c91e52 proper typing	10 months ago
gguf.py	ea26c91e52 proper typing	10 months ago
gptq.py	41beab5dc1 add exllamav2 tensor paralell, fused MoE for GPTQ/AWQ	10 months ago
hadamard.safetensors	c3a221eb02 feat: GGUF, QuIP#, and Marlin support (#228)	1 year ago
marlin.py	ea26c91e52 proper typing	10 months ago
quip.py	ea26c91e52 proper typing	10 months ago
quip_utils.py	e42a78381a feat: switch from pylint to ruff (#322)	10 months ago
schema.py	7528e0ce3e make detokenization optional	9 months ago
squeezellm.py	ea26c91e52 proper typing	10 months ago