AlpinDale
|
00503b9fc1
feat: non-uniform quantization via `compressed-tensors` for llama
|
4 months ago |
AlpinDale
|
6c4c20652b
feat: pipeline parallel support for mixtral
|
4 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
4 months ago |
AlpinDale
|
1efd0f89b7
feat: support FP8 for DeepSeekV2 MoE
|
4 months ago |
AlpinDale
|
9622c59f8f
chore: support 2D input shape in MoE layer
|
4 months ago |
AlpinDale
|
0f4a9ee77b
quantized lm_head (#582)
|
4 months ago |
AlpinDale
|
cf472315cc
refactor: isolate FP8 from mixtral
|
4 months ago |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
4 months ago |
AlpinDale
|
c5d8028668
fix: no need to redefine supports_vision and supports_lora in model class
|
5 months ago |
AlpinDale
|
56e0b8223c
chore: add base class for LoRA-supported models
|
5 months ago |
AlpinDale
|
ab5ffb228c
fp8: `act_scale` -> `input_scale`
|
5 months ago |
AlpinDale
|
39b36efabf
fix: mixtral fp8 ckpt loading
|
5 months ago |
AlpinDale
|
67084aca5b
do not build cutlass kernels if cuda version is too low
|
5 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
5 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
5 months ago |
AlpinDale
|
d7c0dd5b50
fix: do not set the weight to fp8 for fp16 checkpoints
|
5 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
5 months ago |
AlpinDale
|
40a59cca1d
support fp8 mixtral checkpoints with static weights and dynamic/statics acts
|
5 months ago |
AlpinDale
|
a12a1bbf82
fix mixtral for non-cuda devices
|
5 months ago |
AlpinDale
|
36660b55c2
chore: mixtral fp8 w/ static scales (#542)
|
5 months ago |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
5 months ago |
AlpinDale
|
46159b107a
formatting: pt1
|
6 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
AlpinDale
|
da223153c6
feat&fix: cohere support and missing GPU blocks (#333)
|
10 months ago |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
10 months ago |
AlpinDale
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
10 months ago |
AlpinDale
|
7d6ba53602
feat: fused top-k kernels for MoE (#273)
|
10 months ago |
AlpinDale
|
224b87b484
feat: add fused mixtral moe support (#238)
|
10 months ago |