AlpinDale
|
ad24e74a99
feat: FP8 weight-only quantization support for Ampere GPUs
|
6 months ago |
AlpinDale
|
5b464d36ea
feat: bias epilogue support for cutlass kernels
|
7 months ago |
AlpinDale
|
cd9ed8623b
fix: cuda version check for fp8 support in the cutlass kernels
|
7 months ago |
AlpinDale
|
7e54c3916d
chore: factor out epilogues from cutlass kernels
|
7 months ago |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
7 months ago |
AlpinDale
|
aba03b4756
feat: dynamic per-token activation quantization
|
7 months ago |
AlpinDale
|
90bafca8e3
fix: cuda graphs with sparseml quants
|
7 months ago |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
7 months ago |
AlpinDale
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
7 months ago |
AlpinDale
|
2313c97e3d
add cutlass w8a8 kernels (#556)
|
7 months ago |
AlpinDale
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
7 months ago |
AlpinDale
|
c154578c97
gptq_marlin: 8bit GPTQ support
|
7 months ago |
AlpinDale
|
f22b700ee4
feat: marlin kernels for GPTQ (#547)
|
7 months ago |
AlpinDale
|
8de8034f8b
include fp8 compilation in rocm
|
7 months ago |
AlpinDale
|
36660b55c2
chore: mixtral fp8 w/ static scales (#542)
|
7 months ago |
AlpinDale
|
a6a627d745
fix aqlm compilation
|
7 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 months ago |