AlpinDale
|
3a0fdf7b9b
chore: remove `image_input_type` from VLM config
|
7 months ago |
AlpinDale
|
7253e9052d
feat: integrate typical acceptance sampling for spec decoding
|
7 months ago |
AlpinDale
|
b8a19ba27f
chore: extend aphrodite metrics logging api
|
7 months ago |
AlpinDale
|
0886c361f4
feat: OpenVINO CPU backend (#576)
|
7 months ago |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
7 months ago |
AlpinDale
|
80ac1cdc8f
fix: add args for the draft tp
|
7 months ago |
AlpinDale
|
d44ac8e497
fix: `--preemption_mode` -> `--preemption-mode`
|
7 months ago |
AlpinDale
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
AlpinDale
|
fe21123a1c
feat: TPU support (#570)
|
7 months ago |
AlpinDale
|
fa58ba87a3
fix: only set executor backend to mp if not multi-node
|
7 months ago |
AlpinDale
|
bba89fc6d3
chore: make the automatic rope scaling behave properly with rope_scaling arg, add rope theta
|
7 months ago |
AlpinDale
|
c61d9f1aa3
fix: lora_dtype value in args
|
7 months ago |
AlpinDale
|
ec5b99d075
fix: use named args
|
7 months ago |
AlpinDale
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
7 months ago |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
7 months ago |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
7 months ago |
AlpinDale
|
f40b809d3b
allow using v2 block manager with sliding window
|
7 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
7 months ago |
AlpinDale
|
f6250c5516
move dockerfiles to root; fix cpu build
|
7 months ago |
AlpinDale
|
4e1ae004da
make mp the default distributed backend
|
7 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
AlpinDale
|
60e74e92fd
add rope_scaling arg
|
7 months ago |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
7 months ago |
AlpinDale
|
7bcff4ac03
implement sharded state dict
|
7 months ago |
AlpinDale
|
13e5ffd456
fix distributed_executor_backend in args
|
7 months ago |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
7 months ago |
AlpinDale
|
c6a501f682
add multiprocessing executor; make ray optional
|
7 months ago |
AlpinDale
|
0cea453d36
automatically detect tensorized models
|
7 months ago |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
7 months ago |
AlpinDale
|
4acf34417a
feat: add DeepSpeedFP quantization for all models
|
7 months ago |