.. |
aqlm
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
vor 7 Monaten |
autoquant
|
0307da9e15
refactor: bitsandbytes -> autoquant
|
vor 7 Monaten |
awq
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
vor 10 Monaten |
compressed_tensors
|
90bafca8e3
fix: cuda graphs with sparseml quants
|
vor 7 Monaten |
cutlass_w8a8
|
e32f506e17
chore: gpu arch guard for cutlass w8a8 kernels
|
vor 7 Monaten |
exl2
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
vor 10 Monaten |
fp8
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
vor 7 Monaten |
gguf
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
vor 10 Monaten |
gptq
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
vor 7 Monaten |
gptq_marlin
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
vor 7 Monaten |
int8_kvcache
|
9810daa699
feat: INT8 KV Cache (#298)
|
vor 1 Jahr |
marlin
|
d8667fcb98
improve gptq_marlin_24 prefill performance
|
vor 7 Monaten |
quip
|
aebd68c632
feat: backport kernels (#235)
|
vor 1 Jahr |
squeezellm
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
vor 1 Jahr |
quant_ops.cpp
|
f4ea11b982
feat: initial support for activation quantization
|
vor 7 Monaten |
quant_ops.h
|
90bafca8e3
fix: cuda graphs with sparseml quants
|
vor 7 Monaten |