Commit History

Author SHA1 Message Date
  AlpinDale 4599c98f99 feat: dynamic image size support for VLMs 6 months ago
  AlpinDale 5be90c3859 Mamba infrastrucuture support (#586) 6 months ago
  AlpinDale ae04f57ec1 feat: Pipeline Parallel support (#581) 7 months ago
  AlpinDale 3a0fdf7b9b chore: remove `image_input_type` from VLM config 7 months ago
  AlpinDale 63b735bc2a chore: optimize v2 block manager to match the performance of v1 7 months ago
  AlpinDale bcc60a6555 chore: optimize SequenceStatus.is_finished by switching to IntEnum 7 months ago
  AlpinDale cdff8e89f9 feat: introduce `DraftModelRunner` 7 months ago
  AlpinDale c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support 7 months ago
  AlpinDale fad45609b8 chore: remove logical token blocks (turns out they are not needed) 7 months ago
  AlpinDale 405bb74612 Control plane comms refactor (#573) 7 months ago
  AlpinDale af43576da0 feat: add MLPSpeculator speculative decoding support (#572) 7 months ago
  AlpinDale 8d77c69cbd feat: support image processor and add llava example 7 months ago
  AlpinDale 9d19811d4f avoid the nee dto pass `None` values to `Sequence.inputs` 7 months ago
  AlpinDale 9099040472 feat: cross-attention kv caching support 7 months ago
  AlpinDale 90ceab32ff refactor: consolidate prompt args to LLM engines 7 months ago
  AlpinDale a94de94c44 refactor: combine the prefill and decode into a single API (#553) 7 months ago
  AlpinDale 342346afda improve hashing function 7 months ago
  AlpinDale be8154a8a0 feat: proper embeddings API with e5-mistral-7b support 7 months ago
  AlpinDale 197a6d2c16 auto disable speculative decoding by the running queue size 7 months ago
  AlpinDale 8b56dc4347 dict -> torch.Tensor for blocks_to_swap 7 months ago
  AlpinDale 21ce19b3ea blocks_to_copy dict -> torch.Tensor 7 months ago
  AlpinDale ef733aee43 implement ExecuteModelData to reduce executor complexity 7 months ago
  AlpinDale 79901b76de logprobs for target model (spec decoding) 7 months ago
  AlpinDale 2351a0e2cd feat: FlashInfer backend for decoding phase (#548) 7 months ago
  AlpinDale b1555eb208 add new grafana metrics 7 months ago
  AlpinDale aed64884c6 feat: prompt logprobs with chunked prefill (#539) 7 months ago
  AlpinDale 9d81716bfd [v0.5.3] Release Candidate (#388) 10 months ago
  AlpinDale 9181fa0396 feat: Triton kernels for sampling (#383) 11 months ago
  AlpinDale f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 11 months ago
  AlpinDale e42a78381a feat: switch from pylint to ruff (#322) 1 year ago