.. |
compressed_tensors
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
il y a 1 semaine |
gguf_utils
|
8a71788372
Add OLMoE (#772)
|
il y a 2 mois |
kernels
|
f7f3fed265
feat: add async postprocessor (#925)
|
il y a 2 semaines |
utils
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
il y a 1 semaine |
__init__.py
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
il y a 1 semaine |
aqlm.py
|
ccbda97416
fix: types in AQLM and GGUF for dynamo support (#736)
|
il y a 3 mois |
autoquant.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
awq.py
|
edec2e9a9e
feat: migrate awq and awq_marlin to AphroditeParameter (#702)
|
il y a 3 mois |
awq_marlin.py
|
93bc863591
feat: Machete Kernels for Hopper GPUs (#842)
|
il y a 1 mois |
awq_triton.py
|
cbde3c66a5
quants: improve awq_triton throughput (#998)
|
il y a 2 semaines |
base_config.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
bitsandbytes.py
|
6bdff60aab
quant: support pre-quanted bitsandbytes checkpoints (#961)
|
il y a 2 semaines |
deepspeedfp.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
eetq.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
exl2.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
experts_int8.py
|
201db10f02
models: add support for Phi3 MoE
|
il y a 2 semaines |
fbgemm_fp8.py
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
il y a 1 semaine |
fp6.py
|
73177656ed
feat: quant_llm support (#755)
|
il y a 3 mois |
fp8.py
|
201db10f02
models: add support for Phi3 MoE
|
il y a 2 semaines |
gguf.py
|
0dfa6b60ec
core: support logprobs with multi-step scheduling (#963)
|
il y a 2 semaines |
gptq.py
|
83af2524f3
quants: add GPTQ and FBGEMM to AphroditeParameters (#987)
|
il y a 2 semaines |
gptq_marlin.py
|
94a13ad036
fix: gptq_marlin exception on older GPUs (#996)
|
il y a 2 semaines |
gptq_marlin_24.py
|
5d9021969c
quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)
|
il y a 3 semaines |
hadamard.safetensors
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
il y a 8 mois |
hqq_marlin.py
|
f98e7b2f8c
feat: add HQQ quantization support (#795)
|
il y a 2 mois |
kv_cache.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
marlin.py
|
799667737b
quantization: update marlin to use `AphroditeParameters` (#913)
|
il y a 3 semaines |
modelopt.py
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
il y a 1 semaine |
neuron_quant.py
|
145e554a4d
neuron: add 8bit quantization for Neuron (#994)
|
il y a 2 semaines |
qqq.py
|
8976805f90
kernel: asymmetric AQ AZP quantization kernels (#1048)
|
il y a 1 semaine |
quip.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
quip_utils.py
|
8a71788372
Add OLMoE (#772)
|
il y a 2 mois |
schema.py
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
il y a 8 mois |
squeezellm.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
il y a 4 mois |
tpu_int8.py
|
f4b62bf803
quant: update tpu_int8 to use AphroditeParameters (#959)
|
il y a 2 semaines |