.. |
compressed_tensors
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
6 days ago |
gguf_utils
|
8a71788372
Add OLMoE (#772)
|
2 months ago |
kernels
|
f7f3fed265
feat: add async postprocessor (#925)
|
2 weeks ago |
utils
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
6 days ago |
__init__.py
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
1 week ago |
aqlm.py
|
ccbda97416
fix: types in AQLM and GGUF for dynamo support (#736)
|
3 months ago |
autoquant.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
awq.py
|
edec2e9a9e
feat: migrate awq and awq_marlin to AphroditeParameter (#702)
|
3 months ago |
awq_marlin.py
|
93bc863591
feat: Machete Kernels for Hopper GPUs (#842)
|
1 month ago |
awq_triton.py
|
cbde3c66a5
quants: improve awq_triton throughput (#998)
|
1 week ago |
base_config.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
bitsandbytes.py
|
6bdff60aab
quant: support pre-quanted bitsandbytes checkpoints (#961)
|
2 weeks ago |
deepspeedfp.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
eetq.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
exl2.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
experts_int8.py
|
201db10f02
models: add support for Phi3 MoE
|
2 weeks ago |
fbgemm_fp8.py
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
6 days ago |
fp6.py
|
73177656ed
feat: quant_llm support (#755)
|
3 months ago |
fp8.py
|
201db10f02
models: add support for Phi3 MoE
|
2 weeks ago |
gguf.py
|
0dfa6b60ec
core: support logprobs with multi-step scheduling (#963)
|
2 weeks ago |
gptq.py
|
83af2524f3
quants: add GPTQ and FBGEMM to AphroditeParameters (#987)
|
2 weeks ago |
gptq_marlin.py
|
94a13ad036
fix: gptq_marlin exception on older GPUs (#996)
|
1 week ago |
gptq_marlin_24.py
|
5d9021969c
quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)
|
2 weeks ago |
hadamard.safetensors
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
hqq_marlin.py
|
f98e7b2f8c
feat: add HQQ quantization support (#795)
|
2 months ago |
kv_cache.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
marlin.py
|
799667737b
quantization: update marlin to use `AphroditeParameters` (#913)
|
2 weeks ago |
modelopt.py
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
1 week ago |
neuron_quant.py
|
145e554a4d
neuron: add 8bit quantization for Neuron (#994)
|
1 week ago |
qqq.py
|
8976805f90
kernel: asymmetric AQ AZP quantization kernels (#1048)
|
1 week ago |
quip.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
quip_utils.py
|
8a71788372
Add OLMoE (#772)
|
2 months ago |
schema.py
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
squeezellm.py
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
tpu_int8.py
|
f4b62bf803
quant: update tpu_int8 to use AphroditeParameters (#959)
|
2 weeks ago |