david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ sampler_tests

AlpinDale 73177656ed feat: quant_llm support (#755)		il y a 3 mois
..
aqlm	ccbda97416 fix: types in AQLM and GGUF for dynamo support (#736)	il y a 3 mois
autoquant	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
awq	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
compressed_tensors	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
cutlass_w8a8	a401f8e05d feat: per-tensor token epilogue kernels (#630)	il y a 4 mois
exl2	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
fp6	73177656ed feat: quant_llm support (#755)	il y a 3 mois
fp8	b0f262eec1 feat: FP8 quantization support for AMD ROCm (#729)	il y a 3 mois
gguf	ccbda97416 fix: types in AQLM and GGUF for dynamo support (#736)	il y a 3 mois
gptq	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
gptq_marlin	6144150398 chore: use scalar type to dispatch to different `gptq_marlin` kernels (#689)	il y a 4 mois
int8_kvcache	9810daa699 feat: INT8 KV Cache (#298)	il y a 10 mois
marlin	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
quip	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
squeezellm	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
quant_ops.h	73177656ed feat: quant_llm support (#755)	il y a 3 mois