david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 1efd0f89b7351ccfc93cfb0faefa4edd7be5462f

AlpinDale ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs		há 6 meses atrás
..
amd	251568470e initial nvidia fp8 e4m3 for kv cache	há 7 meses atrás
nvidia	3bdeb3e116 fix: clang formatting for all kernels (#558)	há 7 meses atrás
common.cu	37c6da9eb3 feat: vectorized fp8 quant kernel	há 7 meses atrás
fp8_marlin.cu	ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs	há 6 meses atrás