david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

AlpinDale 92cee435e2 rocm: add more quants, fix _scaled_mm call (#1062)		il y a 1 semaine
..
compressed_tensors	92cee435e2 rocm: add more quants, fix _scaled_mm call (#1062)	il y a 1 semaine
gguf_utils	8a71788372 Add OLMoE (#772)	il y a 2 mois
kernels	f7f3fed265 feat: add async postprocessor (#925)	il y a 2 semaines
utils	92cee435e2 rocm: add more quants, fix _scaled_mm call (#1062)	il y a 1 semaine
__init__.py	dcb36de9c4 quants: add support for NVIDIA's ModelOpt checkpoints (#1013)	il y a 1 semaine
aqlm.py	ccbda97416 fix: types in AQLM and GGUF for dynamo support (#736)	il y a 3 mois
autoquant.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
awq.py	edec2e9a9e feat: migrate awq and awq_marlin to AphroditeParameter (#702)	il y a 3 mois
awq_marlin.py	93bc863591 feat: Machete Kernels for Hopper GPUs (#842)	il y a 1 mois
awq_triton.py	cbde3c66a5 quants: improve awq_triton throughput (#998)	il y a 2 semaines
base_config.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
bitsandbytes.py	6bdff60aab quant: support pre-quanted bitsandbytes checkpoints (#961)	il y a 2 semaines
deepspeedfp.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
eetq.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
exl2.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
experts_int8.py	201db10f02 models: add support for Phi3 MoE	il y a 2 semaines
fbgemm_fp8.py	92cee435e2 rocm: add more quants, fix _scaled_mm call (#1062)	il y a 1 semaine
fp6.py	73177656ed feat: quant_llm support (#755)	il y a 3 mois
fp8.py	201db10f02 models: add support for Phi3 MoE	il y a 2 semaines
gguf.py	0dfa6b60ec core: support logprobs with multi-step scheduling (#963)	il y a 2 semaines
gptq.py	83af2524f3 quants: add GPTQ and FBGEMM to AphroditeParameters (#987)	il y a 2 semaines
gptq_marlin.py	94a13ad036 fix: gptq_marlin exception on older GPUs (#996)	il y a 2 semaines
gptq_marlin_24.py	5d9021969c quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)	il y a 3 semaines
hadamard.safetensors	9d81716bfd [v0.5.3] Release Candidate (#388)	il y a 8 mois
hqq_marlin.py	f98e7b2f8c feat: add HQQ quantization support (#795)	il y a 2 mois
kv_cache.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
marlin.py	799667737b quantization: update marlin to use `AphroditeParameters` (#913)	il y a 3 semaines
modelopt.py	dcb36de9c4 quants: add support for NVIDIA's ModelOpt checkpoints (#1013)	il y a 1 semaine
neuron_quant.py	145e554a4d neuron: add 8bit quantization for Neuron (#994)	il y a 2 semaines
qqq.py	8976805f90 kernel: asymmetric AQ AZP quantization kernels (#1048)	il y a 1 semaine
quip.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
quip_utils.py	8a71788372 Add OLMoE (#772)	il y a 2 mois
schema.py	9d81716bfd [v0.5.3] Release Candidate (#388)	il y a 8 mois
squeezellm.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	il y a 4 mois
tpu_int8.py	f4b62bf803 quant: update tpu_int8 to use AphroditeParameters (#959)	il y a 2 semaines