Commit History

Author SHA1 Message Date
  AlpinDale ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs 6 months ago
  AlpinDale 5b464d36ea feat: bias epilogue support for cutlass kernels 7 months ago
  AlpinDale cd9ed8623b fix: cuda version check for fp8 support in the cutlass kernels 7 months ago
  AlpinDale 7e54c3916d chore: factor out epilogues from cutlass kernels 7 months ago
  AlpinDale 156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569) 7 months ago
  AlpinDale aba03b4756 feat: dynamic per-token activation quantization 7 months ago
  AlpinDale 90bafca8e3 fix: cuda graphs with sparseml quants 7 months ago
  AlpinDale f4ea11b982 feat: initial support for activation quantization 7 months ago
  AlpinDale 3bdeb3e116 fix: clang formatting for all kernels (#558) 7 months ago
  AlpinDale 2313c97e3d add cutlass w8a8 kernels (#556) 7 months ago
  AlpinDale c66b1b57b1 Marlin 2:4 sparsity (#555) 7 months ago
  AlpinDale c154578c97 gptq_marlin: 8bit GPTQ support 7 months ago
  AlpinDale f22b700ee4 feat: marlin kernels for GPTQ (#547) 7 months ago
  AlpinDale 8de8034f8b include fp8 compilation in rocm 7 months ago
  AlpinDale 36660b55c2 chore: mixtral fp8 w/ static scales (#542) 7 months ago
  AlpinDale a6a627d745 fix aqlm compilation 7 months ago
  AlpinDale fca911ee0a vLLM Upstream Sync (#526) 8 months ago
  AlpinDale 9d81716bfd [v0.5.3] Release Candidate (#388) 10 months ago