AlpinDale
|
d357341203
chore: add pipeline parallel support for Qwen
|
4 months ago |
AlpinDale
|
1efd0f89b7
feat: support FP8 for DeepSeekV2 MoE
|
4 months ago |
AlpinDale
|
9622c59f8f
chore: support 2D input shape in MoE layer
|
4 months ago |
AlpinDale
|
0f4a9ee77b
quantized lm_head (#582)
|
4 months ago |
AlpinDale
|
cf472315cc
refactor: isolate FP8 from mixtral
|
4 months ago |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
4 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
5 months ago |
AlpinDale
|
676322dd62
qwen2_moe: mlp_only_layers
|
5 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
5 months ago |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
5 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 months ago |
AlpinDale
|
fc80f57967
fix: correct file name for qwen2 moe
|
7 months ago |