AlpinDale
|
45a004874c
chore: allow specifying custom Executor
|
hace 5 meses |
AlpinDale
|
b7a2d52e47
fix: allow using mp executor for pipeline parallel
|
hace 5 meses |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
hace 5 meses |
AlpinDale
|
4f7d212b70
feat: remove vision language config
|
hace 5 meses |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
hace 5 meses |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
hace 6 meses |
AlpinDale
|
25feb1d592
chore: add support for pinning lora adapters in the lru cache
|
hace 6 meses |
AlpinDale
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
hace 6 meses |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
hace 6 meses |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
hace 6 meses |
AlpinDale
|
438f5bdce9
fix ngrams
|
hace 6 meses |
AlpinDale
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
hace 6 meses |
AlpinDale
|
723c6acb84
re-add ngram speculative decoding
|
hace 6 meses |
AlpinDale
|
7bcf4c3fc9
centralize gpu worker construction
|
hace 6 meses |
AlpinDale
|
957ed7d244
type hints
|
hace 6 meses |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
hace 7 meses |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
hace 9 meses |
AlpinDale
|
0f6d56b07f
feat: model executor refactor (#367)
|
hace 10 meses |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
hace 10 meses |