AlpinDale
|
0c162c8dad
api: use fp32 for base64 embeddings (#919)
|
1 month ago |
AlpinDale
|
3b684a8a54
spec decode: streamline batch expansion tensor manipulation (#918)
|
1 month ago |
AlpinDale
|
fce970a846
feat: multi-image input support for Phi3V (#917)
|
1 month ago |
AlpinDale
|
178c2141d4
fix: phi3v crash with unusual image sizes (#916)
|
1 month ago |
AlpinDale
|
f61acdd3ec
api: add json_schema to OpenAI server (#915)
|
1 month ago |
AlpinDale
|
b1492c1529
core: add multi-step scheduling support for the synchronous engine (#914)
|
1 month ago |
AlpinDale
|
799667737b
quantization: update marlin to use `AphroditeParameters` (#913)
|
1 month ago |
AlpinDale
|
16e5b2be8b
fix: empty prompt crashing the server (#912)
|
1 month ago |
AlpinDale
|
673621a3d2
xpu: refactor the model runner for tensor parallelism (#910)
|
1 month ago |
AlpinDale
|
d69273bd2b
ray: better error when placement group topology is incorrect (#906)
|
1 month ago |
AlpinDale
|
6fbab320e7
api: error suppression cleanup + timeout suppression on aborts (#905)
|
1 month ago |
AlpinDale
|
ab533e0e60
spec decode: fix logprobs when using speculative decoding (#904)
|
1 month ago |
AlpinDale
|
afc9a28aa0
chore: add AphroditeParameter support for FP8 quant (#902)
|
1 month ago |
AlpinDale
|
2a60b8f8c9
kernel: do not compile machete for cuda 11 and below (#901)
|
1 month ago |
AlpinDale
|
64c05b969a
fix: `ShardedStateLoader` with fp8 quant (#900)
|
1 month ago |
AlpinDale
|
132aa2abe4
spec decode: add support for EAGLE (#899)
|
1 month ago |
AlpinDale
|
bfc3da41ae
feat: add torch.compile for GemmaRMSNorm (#898)
|
1 month ago |
AlpinDale
|
a00ab49e21
api: add client timeouts for the ZeroMQ server (#897)
|
1 month ago |
AlpinDale
|
908ff753a1
fix: phi_3.5_v loading (#896)
|
1 month ago |
AlpinDale
|
e14223dce5
kernel: use `cub::BlockReduce` instead of custom impl (#895)
|
1 month ago |
AlpinDale
|
ff4b7236d5
build: fix invalid path for envs.py in setup (#894)
|
1 month ago |
AlpinDale
|
f831fd8312
rocm: fix compile issues with rocm 6.2 (#893)
|
1 month ago |
AlpinDale
|
65b71f5fcc
distributed: fix issue for when nodes have multiple network interfaces (#892)
|
1 month ago |
AlpinDale
|
653d1a08d4
feat: add support for audio models (#891)
|
1 month ago |
AlpinDale
|
22a4cd4595
core: fix spec decode metrics and envs circular import (#889)
|
1 month ago |
AlpinDale
|
901900854e
chore: consolidate environment variables within one file (#882)
|
1 month ago |
AlpinDale
|
ce6e3d63f7
api: better startup failure UX (#881)
|
1 month ago |
AlpinDale
|
db6a50fd5c
async: disable multi-step scheduling for sync engine (#880)
|
1 month ago |
AlpinDale
|
afadef06cd
build: pass `PYTHONPATH` from setup.py to cmake (#879)
|
1 month ago |
AlpinDale
|
b5aa11020b
api: fix crashes under very high loads (#878)
|
1 month ago |