AlpinDale
|
5d98b7ead1
fix: input_scale for w8a8 is optional
|
5 months ago |
AlpinDale
|
9be43994fe
feat: fbgemm quantization support (#601)
|
5 months ago |
AlpinDale
|
d3c474d219
chore: enable dynamic per-token `fp8`
|
5 months ago |
AlpinDale
|
e90ad4acec
chore: implement fallback for fp8 channelwise using torch._scaled_mm
|
5 months ago |
AlpinDale
|
b5d23ab6d4
chore: enable bias w/ FP8 layers in CUTLASS kernels
|
5 months ago |
AlpinDale
|
500f3b654f
fix: support bias term in compressed-tensors quant
|
5 months ago |
AlpinDale
|
98cb1c4cd1
feat: support fp8 via `llm-compressor`
|
5 months ago |