david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	9be43994fe feat: fbgemm quantization support (#601)	4 months ago
AlpinDale	00503b9fc1 feat: non-uniform quantization via `compressed-tensors` for llama	4 months ago
AlpinDale	19340b672e chore: improve min_capability checking for `compressed-tensors`	4 months ago
AlpinDale	ee2c5d34da feat: add fp8 channel-wise weight quantization support	4 months ago
AlpinDale	500f3b654f fix: support bias term in compressed-tensors quant	4 months ago
AlpinDale	98cb1c4cd1 feat: support fp8 via `llm-compressor`	4 months ago
AlpinDale	6e561ecda9 chore: clean up `CompressedTensorsW8A8`	4 months ago
AlpinDale	cda0e93a10 abstract away the platform for device capability	4 months ago
AlpinDale	7d79c0e726 chore: use nvml query to avoid accidental cuda initialization	4 months ago
AlpinDale	ddb3323f94 refactor: have w8a8 compressed tensors use `process_weights_after_load` for fp8	4 months ago
AlpinDale	17f7089e26 fix: `get_min_capability` for all quants	4 months ago
AlpinDale	9e75007c40 chore: update w4a16 to wna16 and support w8a16	5 months ago
AlpinDale	b753ff7870 feat: per-channel support for static activation quant	5 months ago
AlpinDale	9b4c72a801 feat: support channel-wise quant for w8a8 dynamic per token activation quant	5 months ago
AlpinDale	e2dbe5f05c feat: add sparse marlin for compressed tensors	5 months ago
AlpinDale	a33aaf3b42 chore: cleanup compressed tensors	5 months ago
AlpinDale	1d00b61622 feat: w4a16 support for compressed-tensors	5 months ago
AlpinDale	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 months ago
AlpinDale	aba03b4756 feat: dynamic per-token activation quantization	5 months ago
AlpinDale	f4ea11b982 feat: initial support for activation quantization	5 months ago

Commit History Find

Commit History