AlpinDale
|
4599c98f99
feat: dynamic image size support for VLMs
|
6 months ago |
AlpinDale
|
5be90c3859
Mamba infrastrucuture support (#586)
|
6 months ago |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
7 months ago |
AlpinDale
|
3a0fdf7b9b
chore: remove `image_input_type` from VLM config
|
7 months ago |
AlpinDale
|
63b735bc2a
chore: optimize v2 block manager to match the performance of v1
|
7 months ago |
AlpinDale
|
bcc60a6555
chore: optimize SequenceStatus.is_finished by switching to IntEnum
|
7 months ago |
AlpinDale
|
cdff8e89f9
feat: introduce `DraftModelRunner`
|
7 months ago |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
7 months ago |
AlpinDale
|
fad45609b8
chore: remove logical token blocks (turns out they are not needed)
|
7 months ago |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
7 months ago |
AlpinDale
|
af43576da0
feat: add MLPSpeculator speculative decoding support (#572)
|
7 months ago |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
7 months ago |
AlpinDale
|
9d19811d4f
avoid the nee dto pass `None` values to `Sequence.inputs`
|
7 months ago |
AlpinDale
|
9099040472
feat: cross-attention kv caching support
|
7 months ago |
AlpinDale
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
7 months ago |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
7 months ago |
AlpinDale
|
342346afda
improve hashing function
|
7 months ago |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
7 months ago |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
7 months ago |
AlpinDale
|
8b56dc4347
dict -> torch.Tensor for blocks_to_swap
|
7 months ago |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
7 months ago |
AlpinDale
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
7 months ago |
AlpinDale
|
79901b76de
logprobs for target model (spec decoding)
|
7 months ago |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
7 months ago |
AlpinDale
|
b1555eb208
add new grafana metrics
|
7 months ago |
AlpinDale
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
7 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 months ago |
AlpinDale
|
9181fa0396
feat: Triton kernels for sampling (#383)
|
11 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
11 months ago |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 year ago |