Commit History

Author SHA1 Message Date
  AlpinDale 9be43994fe feat: fbgemm quantization support (#601) 4 months ago
  AlpinDale 00503b9fc1 feat: non-uniform quantization via `compressed-tensors` for llama 4 months ago
  AlpinDale 19340b672e chore: improve min_capability checking for `compressed-tensors` 4 months ago
  AlpinDale ee2c5d34da feat: add fp8 channel-wise weight quantization support 4 months ago
  AlpinDale 500f3b654f fix: support bias term in compressed-tensors quant 4 months ago
  AlpinDale 98cb1c4cd1 feat: support fp8 via `llm-compressor` 4 months ago
  AlpinDale 6e561ecda9 chore: clean up `CompressedTensorsW8A8` 4 months ago
  AlpinDale cda0e93a10 abstract away the platform for device capability 4 months ago
  AlpinDale 7d79c0e726 chore: use nvml query to avoid accidental cuda initialization 4 months ago
  AlpinDale ddb3323f94 refactor: have w8a8 compressed tensors use `process_weights_after_load` for fp8 4 months ago
  AlpinDale 17f7089e26 fix: `get_min_capability` for all quants 4 months ago
  AlpinDale 9e75007c40 chore: update w4a16 to wna16 and support w8a16 5 months ago
  AlpinDale b753ff7870 feat: per-channel support for static activation quant 5 months ago
  AlpinDale 9b4c72a801 feat: support channel-wise quant for w8a8 dynamic per token activation quant 5 months ago
  AlpinDale e2dbe5f05c feat: add sparse marlin for compressed tensors 5 months ago
  AlpinDale a33aaf3b42 chore: cleanup compressed tensors 5 months ago
  AlpinDale 1d00b61622 feat: w4a16 support for compressed-tensors 5 months ago
  AlpinDale 156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569) 5 months ago
  AlpinDale aba03b4756 feat: dynamic per-token activation quantization 5 months ago
  AlpinDale f4ea11b982 feat: initial support for activation quantization 5 months ago