david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 05e45aeb53702dac44913e0241b4ac2dabbe3a87

AlpinDale ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs		6 달 전
..
aqlm	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
autoquant	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
awq	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
compressed_tensors	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
cutlass_w8a8	b03b4d4c8c fix: compute cutlass 3.x epilogues in fp32 instead of 16	7 달 전
exl2	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
fp8	ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs	6 달 전
gguf	9d81716bfd [v0.5.3] Release Candidate (#388)	10 달 전
gptq	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
gptq_marlin	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
int8_kvcache	9810daa699 feat: INT8 KV Cache (#298)	1 년 전
marlin	1587fab5de fix: cuda version check for mma warning suppression	7 달 전
quip	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
squeezellm	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 달 전
quant_ops.h	ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs	6 달 전