AlpinDale
|
13e5ffd456
fix distributed_executor_backend in args
|
6 months ago |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
6 months ago |
AlpinDale
|
c6a501f682
add multiprocessing executor; make ray optional
|
6 months ago |
AlpinDale
|
0cea453d36
automatically detect tensorized models
|
6 months ago |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
6 months ago |
AlpinDale
|
4acf34417a
feat: add DeepSpeedFP quantization for all models
|
6 months ago |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
6 months ago |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
6 months ago |
Brian Dashore
|
5533ab845e
feat: add uvloop (#550)
|
6 months ago |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
6 months ago |
AlpinDale
|
723c6acb84
re-add ngram speculative decoding
|
6 months ago |
AlpinDale
|
e87c32bed3
feat: full tensor parallel for LoRA layers (#545)
|
6 months ago |
AlpinDale
|
46159b107a
formatting: pt1
|
7 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
7 months ago |
AlpinDale
|
42998e423c
better quant verification
|
8 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
10 months ago |
AlpinDale
|
feb5840f2a
feat: async tokenization (#374)
|
10 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
10 months ago |
AlpinDale
|
c41462cfcd
feat: exllamav2 quantization (#305)
|
10 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
11 months ago |
AlpinDale
|
e0c35bb353
feat: bitsandbytes and `--load-in{4,8}bit` support (#294)
|
11 months ago |
AlpinDale
|
705821a7fe
feat: AQLM quantization support (#293)
|
11 months ago |
AlpinDale
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
11 months ago |
AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
11 months ago |
AlpinDale
|
657aec0cbd
refactor: OpenAI endpoint (#261)
|
11 months ago |
AlpinDale
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
11 months ago |
AlpinDale
|
ea0f57b233
feat: allow further support for non-cuda devices (#247)
|
1 year ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 year ago |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
1 year ago |