.. |
compressed_tensors
|
e8b7f53321
allow prompt token IDs in the logits processor api
|
7 mēneši atpakaļ |
gguf_utils
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 mēneši atpakaļ |
__init__.py
|
f4ea11b982
feat: initial support for activation quantization
|
7 mēneši atpakaļ |
aqlm.py
|
2649f3f14e
aqlm works on pascal
|
7 mēneši atpakaļ |
awq.py
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 mēneši atpakaļ |
base_config.py
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 mēneši atpakaļ |
bitsandbytes.py
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 mēneši atpakaļ |
deepspeedfp.py
|
4acf34417a
feat: add DeepSpeedFP quantization for all models
|
7 mēneši atpakaļ |
eetq.py
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
8 mēneši atpakaļ |
exl2.py
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
8 mēneši atpakaļ |
fp8.py
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 mēneši atpakaļ |
gguf.py
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
8 mēneši atpakaļ |
gptq.py
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 mēneši atpakaļ |
gptq_marlin.py
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 mēneši atpakaļ |
gptq_marlin_24.py
|
f6250c5516
move dockerfiles to root; fix cpu build
|
7 mēneši atpakaļ |
hadamard.safetensors
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 mēneši atpakaļ |
marlin.py
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 mēneši atpakaļ |
quip.py
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 mēneši atpakaļ |
quip_utils.py
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 mēneši atpakaļ |
schema.py
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 mēneši atpakaļ |
squeezellm.py
|
f6250c5516
move dockerfiles to root; fix cpu build
|
7 mēneši atpakaļ |