AlpinDale
|
cc13f2d631
hack: get_last_latency crash
|
2 weeks ago |
AlpinDale
|
a3c03db735
fix: inline model loading conflicts with lora (#930)
|
2 weeks ago |
AlpinDale
|
59d1d59028
api: support aphrodite_config.yaml with inline loading (#929)
|
2 weeks ago |
AlpinDale
|
d46e70ac98
api: add inline model loading (#928)
|
2 weeks ago |
AlpinDale
|
8d9f1fd4e6
feat: add single user mode (#927)
|
2 weeks ago |
AlpinDale
|
53d0ba7c7c
api: add endpoint for loading and unloading the model (#926)
|
2 weeks ago |
AlpinDale
|
f7f3fed265
feat: add async postprocessor (#925)
|
2 weeks ago |
AlpinDale
|
5cb2e998d8
quants: update compressed tensors lifecycle to remove `prefix` from `create_weights` (#924)
|
3 weeks ago |
AlpinDale
|
0c6d90dade
neuron: add support for tensor parallelism (#923)
|
3 weeks ago |
AlpinDale
|
2940da2c7b
distributed: fix custom allreduce p2p cache file generation (#922)
|
3 weeks ago |
AlpinDale
|
5d9021969c
quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)
|
3 weeks ago |
AlpinDale
|
9c9b2dd843
core: improve warmup times for prefix caching in block manager v2 (#920)
|
3 weeks ago |
AlpinDale
|
0c162c8dad
api: use fp32 for base64 embeddings (#919)
|
3 weeks ago |
AlpinDale
|
3b684a8a54
spec decode: streamline batch expansion tensor manipulation (#918)
|
3 weeks ago |
AlpinDale
|
fce970a846
feat: multi-image input support for Phi3V (#917)
|
3 weeks ago |
AlpinDale
|
178c2141d4
fix: phi3v crash with unusual image sizes (#916)
|
3 weeks ago |
AlpinDale
|
f61acdd3ec
api: add json_schema to OpenAI server (#915)
|
3 weeks ago |
AlpinDale
|
b1492c1529
core: add multi-step scheduling support for the synchronous engine (#914)
|
3 weeks ago |
AlpinDale
|
799667737b
quantization: update marlin to use `AphroditeParameters` (#913)
|
3 weeks ago |
AlpinDale
|
16e5b2be8b
fix: empty prompt crashing the server (#912)
|
3 weeks ago |
AlpinDale
|
673621a3d2
xpu: refactor the model runner for tensor parallelism (#910)
|
3 weeks ago |
AlpinDale
|
d69273bd2b
ray: better error when placement group topology is incorrect (#906)
|
3 weeks ago |
AlpinDale
|
6fbab320e7
api: error suppression cleanup + timeout suppression on aborts (#905)
|
3 weeks ago |
AlpinDale
|
ab533e0e60
spec decode: fix logprobs when using speculative decoding (#904)
|
3 weeks ago |
AlpinDale
|
afc9a28aa0
chore: add AphroditeParameter support for FP8 quant (#902)
|
3 weeks ago |
AlpinDale
|
2a60b8f8c9
kernel: do not compile machete for cuda 11 and below (#901)
|
3 weeks ago |
AlpinDale
|
64c05b969a
fix: `ShardedStateLoader` with fp8 quant (#900)
|
3 weeks ago |
AlpinDale
|
132aa2abe4
spec decode: add support for EAGLE (#899)
|
3 weeks ago |
AlpinDale
|
bfc3da41ae
feat: add torch.compile for GemmaRMSNorm (#898)
|
3 weeks ago |
AlpinDale
|
a00ab49e21
api: add client timeouts for the ZeroMQ server (#897)
|
3 weeks ago |