AlpinDale
|
9be43994fe
feat: fbgemm quantization support (#601)
|
4 months ago |
AlpinDale
|
00503b9fc1
feat: non-uniform quantization via `compressed-tensors` for llama
|
4 months ago |
AlpinDale
|
19340b672e
chore: improve min_capability checking for `compressed-tensors`
|
4 months ago |
AlpinDale
|
ee2c5d34da
feat: add fp8 channel-wise weight quantization support
|
4 months ago |
AlpinDale
|
500f3b654f
fix: support bias term in compressed-tensors quant
|
4 months ago |
AlpinDale
|
98cb1c4cd1
feat: support fp8 via `llm-compressor`
|
4 months ago |
AlpinDale
|
6e561ecda9
chore: clean up `CompressedTensorsW8A8`
|
4 months ago |
AlpinDale
|
cda0e93a10
abstract away the platform for device capability
|
4 months ago |
AlpinDale
|
7d79c0e726
chore: use nvml query to avoid accidental cuda initialization
|
4 months ago |
AlpinDale
|
ddb3323f94
refactor: have w8a8 compressed tensors use `process_weights_after_load` for fp8
|
4 months ago |
AlpinDale
|
17f7089e26
fix: `get_min_capability` for all quants
|
4 months ago |
AlpinDale
|
9e75007c40
chore: update w4a16 to wna16 and support w8a16
|
5 months ago |
AlpinDale
|
b753ff7870
feat: per-channel support for static activation quant
|
5 months ago |
AlpinDale
|
9b4c72a801
feat: support channel-wise quant for w8a8 dynamic per token activation quant
|
5 months ago |
AlpinDale
|
e2dbe5f05c
feat: add sparse marlin for compressed tensors
|
5 months ago |
AlpinDale
|
a33aaf3b42
chore: cleanup compressed tensors
|
5 months ago |
AlpinDale
|
1d00b61622
feat: w4a16 support for compressed-tensors
|
5 months ago |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
5 months ago |
AlpinDale
|
aba03b4756
feat: dynamic per-token activation quantization
|
5 months ago |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
5 months ago |