AlpinDale
|
8b42b58228
vlm: stack multimodal tensors to represent multiple images within each prompt (#937)
|
2 weeks ago |
AlpinDale
|
c50309d386
model: add support for paligemma2 (#936)
|
2 weeks ago |
AlpinDale
|
03bd85c950
chore: multi-image support for llava-next (#935)
|
2 weeks ago |
AlpinDale
|
9f3e7c86e2
feat: add fused Marlin MoE kernel (#934)
|
2 weeks ago |
AlpinDale
|
9b76e7f39b
fix: phi3v image_idx in async server (#933)
|
2 weeks ago |
AlpinDale
|
15cb8d5c26
xpu: support pipeline parallel (#932)
|
2 weeks ago |
AlpinDale
|
436d8fa0f1
core: do not compile for profiling (#931)
|
2 weeks ago |
AlpinDale
|
a3c03db735
fix: inline model loading conflicts with lora (#930)
|
3 weeks ago |
AlpinDale
|
59d1d59028
api: support aphrodite_config.yaml with inline loading (#929)
|
3 weeks ago |
AlpinDale
|
d46e70ac98
api: add inline model loading (#928)
|
3 weeks ago |
AlpinDale
|
8d9f1fd4e6
feat: add single user mode (#927)
|
3 weeks ago |
AlpinDale
|
53d0ba7c7c
api: add endpoint for loading and unloading the model (#926)
|
3 weeks ago |
AlpinDale
|
f7f3fed265
feat: add async postprocessor (#925)
|
3 weeks ago |
AlpinDale
|
5cb2e998d8
quants: update compressed tensors lifecycle to remove `prefix` from `create_weights` (#924)
|
3 weeks ago |
AlpinDale
|
0c6d90dade
neuron: add support for tensor parallelism (#923)
|
3 weeks ago |
AlpinDale
|
2940da2c7b
distributed: fix custom allreduce p2p cache file generation (#922)
|
3 weeks ago |
AlpinDale
|
5d9021969c
quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)
|
3 weeks ago |
AlpinDale
|
9c9b2dd843
core: improve warmup times for prefix caching in block manager v2 (#920)
|
3 weeks ago |
AlpinDale
|
0c162c8dad
api: use fp32 for base64 embeddings (#919)
|
3 weeks ago |
AlpinDale
|
3b684a8a54
spec decode: streamline batch expansion tensor manipulation (#918)
|
3 weeks ago |
AlpinDale
|
fce970a846
feat: multi-image input support for Phi3V (#917)
|
3 weeks ago |
AlpinDale
|
178c2141d4
fix: phi3v crash with unusual image sizes (#916)
|
3 weeks ago |
AlpinDale
|
f61acdd3ec
api: add json_schema to OpenAI server (#915)
|
3 weeks ago |
AlpinDale
|
b1492c1529
core: add multi-step scheduling support for the synchronous engine (#914)
|
3 weeks ago |
AlpinDale
|
799667737b
quantization: update marlin to use `AphroditeParameters` (#913)
|
3 weeks ago |
AlpinDale
|
16e5b2be8b
fix: empty prompt crashing the server (#912)
|
3 weeks ago |
AlpinDale
|
673621a3d2
xpu: refactor the model runner for tensor parallelism (#910)
|
3 weeks ago |
AlpinDale
|
d69273bd2b
ray: better error when placement group topology is incorrect (#906)
|
3 weeks ago |
AlpinDale
|
6fbab320e7
api: error suppression cleanup + timeout suppression on aborts (#905)
|
3 weeks ago |
AlpinDale
|
ab533e0e60
spec decode: fix logprobs when using speculative decoding (#904)
|
3 weeks ago |