AlpinDale e32f506e17 chore: gpu arch guard for cutlass w8a8 kernels vor 7 Monaten
..
aqlm 3bdeb3e116 fix: clang formatting for all kernels (#558) vor 7 Monaten
autoquant 0307da9e15 refactor: bitsandbytes -> autoquant vor 7 Monaten
awq 9d81716bfd [v0.5.3] Release Candidate (#388) vor 10 Monaten
compressed_tensors 90bafca8e3 fix: cuda graphs with sparseml quants vor 7 Monaten
cutlass_w8a8 e32f506e17 chore: gpu arch guard for cutlass w8a8 kernels vor 7 Monaten
exl2 9d81716bfd [v0.5.3] Release Candidate (#388) vor 10 Monaten
fp8 3bdeb3e116 fix: clang formatting for all kernels (#558) vor 7 Monaten
gguf 9d81716bfd [v0.5.3] Release Candidate (#388) vor 10 Monaten
gptq 3bdeb3e116 fix: clang formatting for all kernels (#558) vor 7 Monaten
gptq_marlin 3bdeb3e116 fix: clang formatting for all kernels (#558) vor 7 Monaten
int8_kvcache 9810daa699 feat: INT8 KV Cache (#298) vor 1 Jahr
marlin d8667fcb98 improve gptq_marlin_24 prefill performance vor 7 Monaten
quip aebd68c632 feat: backport kernels (#235) vor 1 Jahr
squeezellm 8fa608aeb7 feat: replace Ray with NCCL for control plane comms (#221) vor 1 Jahr
quant_ops.cpp f4ea11b982 feat: initial support for activation quantization vor 7 Monaten
quant_ops.h 90bafca8e3 fix: cuda graphs with sparseml quants vor 7 Monaten