david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ ed6717d0c064a1cbf0fe9f358008ffc43e20187a

AlpinDale 9be43994fe feat: fbgemm quantization support (#601)		5 months ago
..
configs	fca911ee0a vLLM Upstream Sync (#526)	7 months ago
__init__.py	cf472315cc refactor: isolate FP8 from mixtral	6 months ago
fused_moe.py	1efd0f89b7 feat: support FP8 for DeepSeekV2 MoE	5 months ago
layer.py	9be43994fe feat: fbgemm quantization support (#601)	5 months ago
moe_pallas.py	e1475fbec7 feat: MoE support with Pallas GMM kernel for TPUs	5 months ago