david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 801eda0b7a3c3c44ca165c95b461800a31bb770c

AlpinDale 801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181)		hai 1 ano
..
attention	b9b295d74e chore: backlogs 1 (#191)	hai 1 ano
quantization	801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181)	hai 1 ano
activation_kernels.cu	b9b295d74e chore: backlogs 1 (#191)	hai 1 ano
cache.h	1aab8a7d6f feat: speedup compilation times by 3x (#130)	hai 1 ano
cache_kernels.cu	b9b295d74e chore: backlogs 1 (#191)	hai 1 ano
cuda_compat.h	1334a833a4 feat: AMD ROCm support (#95)	hai 1 ano
cuda_utils.h	1aab8a7d6f feat: speedup compilation times by 3x (#130)	hai 1 ano
cuda_utils_kernels.cu	1334a833a4 feat: AMD ROCm support (#95)	hai 1 ano
dispatch_utils.h	32844c1522 add GELU kernels and remove compile bloat	hai 1 ano
layernorm_kernels.cu	b9b295d74e chore: backlogs 1 (#191)	hai 1 ano
ops.h	801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181)	hai 1 ano
pos_encoding_kernels.cu	b9b295d74e chore: backlogs 1 (#191)	hai 1 ano
pybind.cpp	62b2c4119d feat: re-write GPTQ and refactor exllama kernels (#152)	hai 1 ano
reduction.cuh	1334a833a4 feat: AMD ROCm support (#95)	hai 1 ano