AlpinDale
|
6671e3a162
feat: add CPU offloading support (#598)
|
5 miesięcy temu |
AlpinDale
|
22305c91e9
refactor _prepare_model_input_tensor and attn metadata builder for most backends
|
5 miesięcy temu |
AlpinDale
|
5289c14b24
feat: Asymmetric Tensor Parallel (#594)
|
5 miesięcy temu |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
5 miesięcy temu |
AlpinDale
|
c11a8bdaad
fix: calculate max number of multi-modal tokens automatically
|
5 miesięcy temu |
AlpinDale
|
151d782233
fix: attention softcapping for flashinfer
|
5 miesięcy temu |
AlpinDale
|
4f7d212b70
feat: remove vision language config
|
5 miesięcy temu |
AlpinDale
|
4599c98f99
feat: dynamic image size support for VLMs
|
5 miesięcy temu |
AlpinDale
|
5be90c3859
Mamba infrastrucuture support (#586)
|
5 miesięcy temu |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
5 miesięcy temu |
AlpinDale
|
3a0fdf7b9b
chore: remove `image_input_type` from VLM config
|
5 miesięcy temu |
AlpinDale
|
b6e60143e7
Flashinfer for prefill phase (#580)
|
5 miesięcy temu |
AlpinDale
|
cdff8e89f9
feat: introduce `DraftModelRunner`
|
5 miesięcy temu |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
5 miesięcy temu |
AlpinDale
|
56e0b8223c
chore: add base class for LoRA-supported models
|
5 miesięcy temu |
AlpinDale
|
dead030abf
fix: cuda graph with MLPSpeculator
|
5 miesięcy temu |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
5 miesięcy temu |
AlpinDale
|
25feb1d592
chore: add support for pinning lora adapters in the lru cache
|
5 miesięcy temu |
AlpinDale
|
af43576da0
feat: add MLPSpeculator speculative decoding support (#572)
|
5 miesięcy temu |
AlpinDale
|
34b41e0a87
chore: add coordinator to reduce code duplication in tp and pp
|
6 miesięcy temu |
AlpinDale
|
d0cca80b8b
feat: support sharded tensorizer models
|
6 miesięcy temu |
AlpinDale
|
4d1e613804
chore: minor simplifications
|
6 miesięcy temu |
AlpinDale
|
6cecbbff6a
fix: reduce memory footprint of cuda graph by adding output buffer
|
6 miesięcy temu |
AlpinDale
|
c975bba905
fix: sharded state loader with lora
|
6 miesięcy temu |
AlpinDale
|
e321d80e4e
fix: `prompt_logprobs==0` case
|
6 miesięcy temu |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
6 miesięcy temu |
AlpinDale
|
08f639b8aa
remove duplicate seq_lens_tensor
|
6 miesięcy temu |
AlpinDale
|
f40b809d3b
allow using v2 block manager with sliding window
|
6 miesięcy temu |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
6 miesięcy temu |
AlpinDale
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
6 miesięcy temu |