AlpinDale
|
0e6c400b13
feat: re-add GGUF (#600)
|
hai 4 meses |
AlpinDale
|
9a50e3b4eb
refactor: minicpmv and port Idefix2VisionTransformer
|
hai 4 meses |
AlpinDale
|
6b1fdd07bd
chore: add isort and refactor formatting script and utils
|
hai 4 meses |
AlpinDale
|
2226c1b7bd
fix: replicatedlinear weight loading
|
hai 4 meses |
AlpinDale
|
ba371fbbbd
feat: AWQ marlin kernels (#603)
|
hai 4 meses |
AlpinDale
|
08373fd1ee
fix: asymmetric TP changes breaking the gptq and awq quants (#602)
|
hai 4 meses |
AlpinDale
|
9be43994fe
feat: fbgemm quantization support (#601)
|
hai 4 meses |
AlpinDale
|
6600c082bc
chore: pass bias to quant_method.apply
|
hai 4 meses |
AlpinDale
|
00503b9fc1
feat: non-uniform quantization via `compressed-tensors` for llama
|
hai 4 meses |
AlpinDale
|
5289c14b24
feat: Asymmetric Tensor Parallel (#594)
|
hai 4 meses |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
hai 4 meses |
AlpinDale
|
d2f38f6f81
chore: remove separate bias add
|
hai 4 meses |
AlpinDale
|
6abf4e3883
fix: needs_scalar_to_array logic check in linear layer
|
hai 4 meses |
AlpinDale
|
ddb3323f94
refactor: have w8a8 compressed tensors use `process_weights_after_load` for fp8
|
hai 5 meses |
AlpinDale
|
272c64ab88
chore: allow loading fp8 models with fused qkv/mlp
|
hai 5 meses |
AlpinDale
|
772a126c08
chore: simplify fp8 weight loading
|
hai 5 meses |
AlpinDale
|
9b4c72a801
feat: support channel-wise quant for w8a8 dynamic per token activation quant
|
hai 5 meses |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
hai 5 meses |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
hai 5 meses |
AlpinDale
|
6fc1ec6e9a
fix redirects and improve low level debugging
|
hai 5 meses |
AlpinDale
|
7d23892501
static and dynamic fp8
|
hai 5 meses |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
hai 5 meses |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
hai 6 meses |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
hai 8 meses |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
hai 10 meses |
AlpinDale
|
c2d77b1822
chore: logging refactor (#302)
|
hai 10 meses |
AlpinDale
|
705821a7fe
feat: AQLM quantization support (#293)
|
hai 10 meses |
AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
hai 10 meses |
AlpinDale
|
ea0f57b233
feat: allow further support for non-cuda devices (#247)
|
hai 11 meses |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
hai 11 meses |