Commit History

Author SHA1 Message Date
  AlpinDale 4599c98f99 feat: dynamic image size support for VLMs 7 months ago
  AlpinDale 5be90c3859 Mamba infrastrucuture support (#586) 7 months ago
  AlpinDale ae04f57ec1 feat: Pipeline Parallel support (#581) 7 months ago
  AlpinDale 3a0fdf7b9b chore: remove `image_input_type` from VLM config 7 months ago
  AlpinDale b6e60143e7 Flashinfer for prefill phase (#580) 7 months ago
  AlpinDale cdff8e89f9 feat: introduce `DraftModelRunner` 7 months ago
  AlpinDale c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support 7 months ago
  AlpinDale 56e0b8223c chore: add base class for LoRA-supported models 7 months ago
  AlpinDale dead030abf fix: cuda graph with MLPSpeculator 7 months ago
  AlpinDale 405bb74612 Control plane comms refactor (#573) 7 months ago
  AlpinDale 25feb1d592 chore: add support for pinning lora adapters in the lru cache 7 months ago
  AlpinDale af43576da0 feat: add MLPSpeculator speculative decoding support (#572) 7 months ago
  AlpinDale 34b41e0a87 chore: add coordinator to reduce code duplication in tp and pp 7 months ago
  AlpinDale d0cca80b8b feat: support sharded tensorizer models 7 months ago
  AlpinDale 4d1e613804 chore: minor simplifications 7 months ago
  AlpinDale 6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer 7 months ago
  AlpinDale c975bba905 fix: sharded state loader with lora 7 months ago
  AlpinDale e321d80e4e fix: `prompt_logprobs==0` case 7 months ago
  AlpinDale 8d77c69cbd feat: support image processor and add llava example 7 months ago
  AlpinDale 08f639b8aa remove duplicate seq_lens_tensor 7 months ago
  AlpinDale f40b809d3b allow using v2 block manager with sliding window 7 months ago
  AlpinDale 5b0c11d190 support pipeline parallel pynccl groups 8 months ago
  AlpinDale de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead 8 months ago
  AlpinDale 656459fd84 make fp8_e4m3 work on nvidia 8 months ago
  AlpinDale 0aaf2dfc6b improve parallel logging 8 months ago
  AlpinDale 9e73559eba make use of batched rotary embedding kernels to support long context lora 8 months ago
  AlpinDale eaa06fdd14 fix some f-strings 8 months ago
  AlpinDale c58589318f remove the graph mode func 8 months ago
  AlpinDale 072b30fb42 measure end time within the cuda memory profiler 8 months ago
  AlpinDale 7bcff4ac03 implement sharded state dict 8 months ago