AlpinDale
|
a4cbcfe59f
feat: disable logprob serialization to CPU for spec decode
|
5 months ago |
AlpinDale
|
3a53ff1e01
fix: raise an error for no draft token case when draft_tp>1
|
5 months ago |
AlpinDale
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
5 months ago |
AlpinDale
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
5 months ago |
AlpinDale
|
d9f4c36edd
feat: Medusa speculative decoding support (#590)
|
5 months ago |
AlpinDale
|
dd378ea063
feat: MLPSpeculator with tensor parallel
|
5 months ago |
AlpinDale
|
7253e9052d
feat: integrate typical acceptance sampling for spec decoding
|
5 months ago |
AlpinDale
|
cdff8e89f9
feat: introduce `DraftModelRunner`
|
6 months ago |
AlpinDale
|
b6ff0623a6
chore: clean up branding
|
6 months ago |
AlpinDale
|
abbb730607
feat: support draft model on different tensor parallel size
|
6 months ago |
AlpinDale
|
af43576da0
feat: add MLPSpeculator speculative decoding support (#572)
|
6 months ago |
AlpinDale
|
313e6e1ec7
feat: add typical acceptance sampling
|
6 months ago |
AlpinDale
|
4d1e613804
chore: minor simplifications
|
6 months ago |
AlpinDale
|
e0886ee929
feat: add `ProposerWorkerBase` abstract class
|
6 months ago |
AlpinDale
|
344ddaac5a
properly disable speculative decoding
|
6 months ago |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
6 months ago |
AlpinDale
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
6 months ago |
AlpinDale
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
6 months ago |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
6 months ago |
AlpinDale
|
438f5bdce9
fix ngrams
|
6 months ago |
AlpinDale
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
6 months ago |
AlpinDale
|
79901b76de
logprobs for target model (spec decoding)
|
6 months ago |
AlpinDale
|
723c6acb84
re-add ngram speculative decoding
|
6 months ago |
AlpinDale
|
7bcf4c3fc9
centralize gpu worker construction
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
10 months ago |