AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
6 months ago |
AlpinDale
|
342346afda
improve hashing function
|
6 months ago |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
6 months ago |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
6 months ago |
AlpinDale
|
8b56dc4347
dict -> torch.Tensor for blocks_to_swap
|
6 months ago |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
6 months ago |
AlpinDale
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
6 months ago |
AlpinDale
|
79901b76de
logprobs for target model (spec decoding)
|
6 months ago |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
6 months ago |
AlpinDale
|
b1555eb208
add new grafana metrics
|
6 months ago |
AlpinDale
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
9181fa0396
feat: Triton kernels for sampling (#383)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
10 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
10 months ago |
AlpinDale
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
10 months ago |
AlpinDale
|
657aec0cbd
refactor: OpenAI endpoint (#261)
|
10 months ago |
AlpinDale
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
11 months ago |
AlpinDale
|
d2db4143fa
feat: add grafana for metrics (#240)
|
11 months ago |
AlpinDale
|
c0aac15421
feat: S-LoRA support (#222)
|
1 year ago |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
1 year ago |
AlpinDale
|
f013d714c0
chore: merge dev branch into main (#177)
|
1 year ago |
AlpinDale
|
2755a48d51
merge dev branch into main (#153)
|
1 year ago |
50h100a
|
fa0ae5a2c9
feat: new mirostatv2 implementation (#96)
|
1 year ago |
AlpinDale
|
efc6f7fbec
chore: reformats (#90)
|
1 year ago |
AlpinDale
|
e6be0118c9
feat: prompt logprobs and batched samplers (#77)
|
1 year ago |
AlpinDale
|
75c27d3e65
massive overhaul
|
1 year ago |
AlpinDale
|
6dfca14e1f
compute logprobs with log_softmax instead of log
|
1 year ago |
AlpinDale
|
6b9561ef07
adapt TGI incremental detokenization
|
1 year ago |