AlpinDale
|
9866af1626
chore: optimize scheduler and remove policy
|
4 months ago |
AlpinDale
|
1d8616e4f7
fix: massively improve throughput with high number of prompts
|
4 months ago |
AlpinDale
|
d8a51d05a7
fix: seeded gens with pipeline parallel
|
4 months ago |
AlpinDale
|
e76bbe72eb
chore: handle aborted requests for jamba
|
5 months ago |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
5 months ago |
AlpinDale
|
5be90c3859
Mamba infrastrucuture support (#586)
|
5 months ago |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
5 months ago |
AlpinDale
|
29ddfae8de
fix: typo in scheduler
|
5 months ago |
AlpinDale
|
28e45a6209
fix: attempting to remove a lora that has already been removed
|
5 months ago |
AlpinDale
|
3f92035bf1
fix: add `ignored_seq_groups` in `_schedule_chunked_prefill`
|
5 months ago |
AlpinDale
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
5 months ago |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
6 months ago |
AlpinDale
|
b7667151e5
fix scheduler being off by one for lora support
|
6 months ago |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
6 months ago |
AlpinDale
|
eaa06fdd14
fix some f-strings
|
6 months ago |
AlpinDale
|
342346afda
improve hashing function
|
6 months ago |
AlpinDale
|
fd0a5c0ea4
raise a warning during preemption and swapping
|
6 months ago |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
6 months ago |
AlpinDale
|
8b56dc4347
dict -> torch.Tensor for blocks_to_swap
|
6 months ago |
AlpinDale
|
148aca8ff1
cow => dict[int, list] -> list
|
6 months ago |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
6 months ago |
AlpinDale
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
6 months ago |
AlpinDale
|
25c2b6feca
ignore infeasible swap requests
|
6 months ago |
AlpinDale
|
5529304d1f
fix sampling with n>1
|
6 months ago |
AlpinDale
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
6 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
10 months ago |