david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ ec5b99d075c728d0ab4188ce1f277d24b45d389c

AlpinDale 39b36efabf fix: mixtral fp8 ckpt loading		7 hónapja
..
compressed_tensors	90bafca8e3 fix: cuda graphs with sparseml quants	7 hónapja
gguf_utils	9d81716bfd [v0.5.3] Release Candidate (#388)	10 hónapja
__init__.py	690110a051 feat: bitsandbytes quantization	7 hónapja
aqlm.py	2649f3f14e aqlm works on pascal	7 hónapja
autoquant.py	0307da9e15 refactor: bitsandbytes -> autoquant	7 hónapja
awq.py	c66b1b57b1 Marlin 2:4 sparsity (#555)	7 hónapja
base_config.py	c66b1b57b1 Marlin 2:4 sparsity (#555)	7 hónapja
bitsandbytes.py	690110a051 feat: bitsandbytes quantization	7 hónapja
deepspeedfp.py	4acf34417a feat: add DeepSpeedFP quantization for all models	8 hónapja
eetq.py	b178ae4b4a chore: generalize linear_method to be quant_method (#540)	8 hónapja
exl2.py	b178ae4b4a chore: generalize linear_method to be quant_method (#540)	8 hónapja
fp8.py	39b36efabf fix: mixtral fp8 ckpt loading	7 hónapja
gguf.py	b178ae4b4a chore: generalize linear_method to be quant_method (#540)	8 hónapja
gptq.py	c66b1b57b1 Marlin 2:4 sparsity (#555)	7 hónapja
gptq_marlin.py	5cedee9024 fix gemma with gptq marlin	7 hónapja
gptq_marlin_24.py	f6250c5516 move dockerfiles to root; fix cpu build	7 hónapja
hadamard.safetensors	9d81716bfd [v0.5.3] Release Candidate (#388)	10 hónapja
marlin.py	c66b1b57b1 Marlin 2:4 sparsity (#555)	7 hónapja
quip.py	c66b1b57b1 Marlin 2:4 sparsity (#555)	7 hónapja
quip_utils.py	9d81716bfd [v0.5.3] Release Candidate (#388)	10 hónapja
schema.py	9d81716bfd [v0.5.3] Release Candidate (#388)	10 hónapja
squeezellm.py	f6250c5516 move dockerfiles to root; fix cpu build	7 hónapja