david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 05e45aeb53702dac44913e0241b4ac2dabbe3a87

AlpinDale ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs		há 6 meses atrás
..
aqlm	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
autoquant	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
awq	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
compressed_tensors	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
cutlass_w8a8	b03b4d4c8c fix: compute cutlass 3.x epilogues in fp32 instead of 16	há 7 meses atrás
exl2	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
fp8	ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs	há 6 meses atrás
gguf	9d81716bfd [v0.5.3] Release Candidate (#388)	há 10 meses atrás
gptq	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
gptq_marlin	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
int8_kvcache	9810daa699 feat: INT8 KV Cache (#298)	há 1 ano atrás
marlin	1587fab5de fix: cuda version check for mma warning suppression	há 7 meses atrás
quip	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
squeezellm	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
quant_ops.h	ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs	há 6 meses atrás