AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
7 months ago |
AlpinDale
|
f6250c5516
move dockerfiles to root; fix cpu build
|
7 months ago |
AlpinDale
|
4e1ae004da
make mp the default distributed backend
|
7 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
AlpinDale
|
60e74e92fd
add rope_scaling arg
|
7 months ago |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
7 months ago |
AlpinDale
|
7bcff4ac03
implement sharded state dict
|
7 months ago |
AlpinDale
|
13e5ffd456
fix distributed_executor_backend in args
|
7 months ago |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
7 months ago |
AlpinDale
|
c6a501f682
add multiprocessing executor; make ray optional
|
7 months ago |
AlpinDale
|
0cea453d36
automatically detect tensorized models
|
7 months ago |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
7 months ago |
AlpinDale
|
4acf34417a
feat: add DeepSpeedFP quantization for all models
|
7 months ago |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
7 months ago |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
7 months ago |
Brian Dashore
|
5533ab845e
feat: add uvloop (#550)
|
7 months ago |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
7 months ago |
AlpinDale
|
723c6acb84
re-add ngram speculative decoding
|
7 months ago |
AlpinDale
|
e87c32bed3
feat: full tensor parallel for LoRA layers (#545)
|
7 months ago |
AlpinDale
|
46159b107a
formatting: pt1
|
8 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 months ago |
AlpinDale
|
42998e423c
better quant verification
|
9 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 months ago |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
11 months ago |
AlpinDale
|
feb5840f2a
feat: async tokenization (#374)
|
11 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
11 months ago |
AlpinDale
|
c41462cfcd
feat: exllamav2 quantization (#305)
|
1 year ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
1 year ago |
AlpinDale
|
e0c35bb353
feat: bitsandbytes and `--load-in{4,8}bit` support (#294)
|
1 year ago |
AlpinDale
|
705821a7fe
feat: AQLM quantization support (#293)
|
1 year ago |