AlpinDale
|
ba371fbbbd
feat: AWQ marlin kernels (#603)
|
hace 4 meses |
AlpinDale
|
9be43994fe
feat: fbgemm quantization support (#601)
|
hace 4 meses |
AlpinDale
|
7e9d4f3c71
chore: some more marlin cleanups
|
hace 4 meses |
AlpinDale
|
058e629f8e
chore: refactor marlin python utils
|
hace 4 meses |
AlpinDale
|
98cb1c4cd1
feat: support fp8 via `llm-compressor`
|
hace 4 meses |
AlpinDale
|
cda0e93a10
abstract away the platform for device capability
|
hace 4 meses |
AlpinDale
|
0f4a9ee77b
quantized lm_head (#582)
|
hace 4 meses |
AlpinDale
|
7d79c0e726
chore: use nvml query to avoid accidental cuda initialization
|
hace 4 meses |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
hace 5 meses |
AlpinDale
|
5cedee9024
fix gemma with gptq marlin
|
hace 5 meses |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
hace 5 meses |
AlpinDale
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
hace 5 meses |
AlpinDale
|
ad1c6b86a1
gptq_marlin: enable bfloat16
|
hace 5 meses |
AlpinDale
|
c154578c97
gptq_marlin: 8bit GPTQ support
|
hace 5 meses |
AlpinDale
|
ac5b4b6aa7
broadcast metadata through cpu
|
hace 5 meses |
AlpinDale
|
f22b700ee4
feat: marlin kernels for GPTQ (#547)
|
hace 5 meses |