david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ linear_weights_light

AlpinDale 8cf8fa3f09 don't assign meta tensor to a cuda param		7 ヶ月前
..
aqlm	705821a7fe feat: AQLM quantization support (#293)	10 ヶ月前
awq	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
bitsandbytes	a98babfb74 fix: bnb on Turing GPUs (#299)	10 ヶ月前
exl2	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
fp8	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
fp8_e5m2_kvcache	8e1cd54497 fix: do not include fp8 for rocm (#271)	10 ヶ月前
gguf	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
gptq	8cf8fa3f09 don't assign meta tensor to a cuda param	7 ヶ月前
int8_kvcache	9810daa699 feat: INT8 KV Cache (#298)	10 ヶ月前
marlin	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
quip	aebd68c632 feat: backport kernels (#235)	11 ヶ月前
squeezellm	8fa608aeb7 feat: replace Ray with NCCL for control plane comms (#221)	11 ヶ月前
quant_ops.cpp	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
quant_ops.h	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前