Commit History

Author SHA1 Message Date
  AlpinDale 79d603954e fix: chunked prefill with v2 block manager (#679) 4 months ago
  AlpinDale 3bbb3f2086 feat: add numpy implementation of `compute_slot_mapping` (#678) 4 months ago
  AlpinDale df208ab4e9 fix: fp8 checkpoints with fused linear modules (#677) 4 months ago
  AlpinDale 81fa31bcaf feat: embeddings support for batched OAI endpoint (#676) 4 months ago
  AlpinDale c2bb886b2e fix: reinit procedure in `ModelInputForGPUBuilder` (#675) 4 months ago
  AlpinDale bf88c8567e feat: mamba model support (#674) 4 months ago
  AlpinDale 8583aefed7 chore: mamba cache single buffer (#673) 4 months ago
  AlpinDale 19ad952dd4 chore: better stream termination in async engine (#672) 4 months ago
  AlpinDale 1394008421 chore: decouple `should_modify_greedy_probs_inplace (#671) 4 months ago
  AlpinDale 2da6a3ec2b feat: option to apply temperature scaling last (#670) 4 months ago
  AlpinDale e3a53712f2 fix: mlpspeculator with padded vocab (#669) 4 months ago
  AlpinDale e200775863 feat: enable using fp8 kv and prefix caching with chunked prefill (#668) 4 months ago
  AlpinDale ef40c05cd3 fix: minor adjustments to scheduler and block manager (#667) 4 months ago
  AlpinDale 7df7b8ca53 optimization: reduce end-to-end overhead from python obj allocation (#666) 4 months ago
  AlpinDale ea78357d70 fix: deps with TPU dockerfile (#665) 4 months ago
  AlpinDale 62111fab17 feat: allow serving encoder-decoder models in the API server (#664) 4 months ago
  AlpinDale 3f49a55f82 feat: add INT8 W8A16 quant for TPU (#663) 4 months ago
  AlpinDale 5dd0145414 chore: update the env.py script and the bug report template (#662) 4 months ago
  AlpinDale 1927ce2be4 fix: `get_num_blocks_touched` logic (#661) 4 months ago
  AlpinDale ed9a6f97c1 fix: kill api server when pinging dead engine (#660) 4 months ago
  AlpinDale 6d54f7687d fix: lora with pipeline parallel (#659) 4 months ago
  AlpinDale 3405782f24 fix: max_num_batched_tokens should not be limited for lora (#658) 4 months ago
  AlpinDale 67ee885293 fix: flashinfer outputs (#657) 4 months ago
  AlpinDale 0e5bb11503 fix: make `merge_async_iterators.is_cancelled()` optional (#656) 4 months ago
  AlpinDale 3170c0d4c6 fix: GPTQ/AWQ on Colab (#655) 4 months ago
  AlpinDale 83bcb9119a fix: multiprocessing timeout (#654) 4 months ago
  AlpinDale 1e119cbeb6 fix: input processor in internvl2 (#653) 4 months ago
  AlpinDale a2344d3617 fix: move zeromq rpc frontend to IPC instead of TCP (#652) 4 months ago
  AlpinDale f1e1d0bd3d feat: introduce `BaseAphroditeParameter` (#646) 4 months ago
  AlpinDale 47ac074937 fix: RSLoRA support (#647) 4 months ago