david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 766ea79b89d532f8b06a414f05fd6aa35a2ed4ea

AlpinDale f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)		преди 2 седмици
..
__init__.py	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	преди 2 седмици
cpu.py	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	преди 2 седмици
cuda.py	9f3e7c86e2 feat: add fused Marlin MoE kernel (#934)	преди 2 седмици
interface.py	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	преди 2 седмици
rocm.py	81c28d2a7f fix: use nvml to get consistent device names (#739)	преди 3 месеца
tpu.py	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	преди 2 седмици