Commit History

Author SHA1 Message Date
  AlpinDale 0c162c8dad api: use fp32 for base64 embeddings (#919) 1 month ago
  AlpinDale 3b684a8a54 spec decode: streamline batch expansion tensor manipulation (#918) 1 month ago
  AlpinDale fce970a846 feat: multi-image input support for Phi3V (#917) 1 month ago
  AlpinDale 178c2141d4 fix: phi3v crash with unusual image sizes (#916) 1 month ago
  AlpinDale f61acdd3ec api: add json_schema to OpenAI server (#915) 1 month ago
  AlpinDale b1492c1529 core: add multi-step scheduling support for the synchronous engine (#914) 1 month ago
  AlpinDale 799667737b quantization: update marlin to use `AphroditeParameters` (#913) 1 month ago
  AlpinDale 16e5b2be8b fix: empty prompt crashing the server (#912) 1 month ago
  AlpinDale 673621a3d2 xpu: refactor the model runner for tensor parallelism (#910) 1 month ago
  AlpinDale d69273bd2b ray: better error when placement group topology is incorrect (#906) 1 month ago
  AlpinDale 6fbab320e7 api: error suppression cleanup + timeout suppression on aborts (#905) 1 month ago
  AlpinDale ab533e0e60 spec decode: fix logprobs when using speculative decoding (#904) 1 month ago
  AlpinDale afc9a28aa0 chore: add AphroditeParameter support for FP8 quant (#902) 1 month ago
  AlpinDale 2a60b8f8c9 kernel: do not compile machete for cuda 11 and below (#901) 1 month ago
  AlpinDale 64c05b969a fix: `ShardedStateLoader` with fp8 quant (#900) 1 month ago
  AlpinDale 132aa2abe4 spec decode: add support for EAGLE (#899) 1 month ago
  AlpinDale bfc3da41ae feat: add torch.compile for GemmaRMSNorm (#898) 1 month ago
  AlpinDale a00ab49e21 api: add client timeouts for the ZeroMQ server (#897) 1 month ago
  AlpinDale 908ff753a1 fix: phi_3.5_v loading (#896) 1 month ago
  AlpinDale e14223dce5 kernel: use `cub::BlockReduce` instead of custom impl (#895) 1 month ago
  AlpinDale ff4b7236d5 build: fix invalid path for envs.py in setup (#894) 1 month ago
  AlpinDale f831fd8312 rocm: fix compile issues with rocm 6.2 (#893) 1 month ago
  AlpinDale 65b71f5fcc distributed: fix issue for when nodes have multiple network interfaces (#892) 1 month ago
  AlpinDale 653d1a08d4 feat: add support for audio models (#891) 1 month ago
  AlpinDale 22a4cd4595 core: fix spec decode metrics and envs circular import (#889) 1 month ago
  AlpinDale 901900854e chore: consolidate environment variables within one file (#882) 1 month ago
  AlpinDale ce6e3d63f7 api: better startup failure UX (#881) 1 month ago
  AlpinDale db6a50fd5c async: disable multi-step scheduling for sync engine (#880) 1 month ago
  AlpinDale afadef06cd build: pass `PYTHONPATH` from setup.py to cmake (#879) 1 month ago
  AlpinDale b5aa11020b api: fix crashes under very high loads (#878) 1 month ago