AlpinDale
|
0f4a9ee77b
quantized lm_head (#582)
|
5 ماه پیش |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 ماه پیش |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
6 ماه پیش |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
6 ماه پیش |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
6 ماه پیش |
AlpinDale
|
9fba7f1d36
remove quant_config from a few legacy models
|
6 ماه پیش |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
6 ماه پیش |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
7 ماه پیش |