AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
5 mesi fa |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
5 mesi fa |
AlpinDale
|
60e74e92fd
add rope_scaling arg
|
5 mesi fa |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
5 mesi fa |
AlpinDale
|
c66b1b57b1
Marlin 2:4 sparsity (#555)
|
5 mesi fa |
AlpinDale
|
7bcff4ac03
implement sharded state dict
|
5 mesi fa |
AlpinDale
|
13e5ffd456
fix distributed_executor_backend in args
|
5 mesi fa |
AlpinDale
|
c6a501f682
add multiprocessing executor; make ray optional
|
5 mesi fa |
AlpinDale
|
e42d0b3455
possibly improve ngram efficiency
|
5 mesi fa |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
5 mesi fa |
AlpinDale
|
4acf34417a
feat: add DeepSpeedFP quantization for all models
|
5 mesi fa |
AlpinDale
|
197a6d2c16
auto disable speculative decoding by the running queue size
|
5 mesi fa |
AlpinDale
|
4476d2d885
remove cuda version check
|
5 mesi fa |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
5 mesi fa |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
5 mesi fa |
AlpinDale
|
723c6acb84
re-add ngram speculative decoding
|
5 mesi fa |
AlpinDale
|
f22b700ee4
feat: marlin kernels for GPTQ (#547)
|
5 mesi fa |
AlpinDale
|
110a2724f4
extended -> llama3, also make rope_type in config work
|
5 mesi fa |
AlpinDale
|
e87c32bed3
feat: full tensor parallel for LoRA layers (#545)
|
5 mesi fa |
AlpinDale
|
3ab36e6b2d
feat: extended RoPE for Llama 3.1 (#543)
|
5 mesi fa |
AlpinDale
|
e7b1368156
feat: Phi3 support
|
6 mesi fa |
AlpinDale
|
46159b107a
formatting: pt1
|
6 mesi fa |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 mesi fa |
AlpinDale
|
42998e423c
better quant verification
|
7 mesi fa |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 mesi fa |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
9 mesi fa |
AlpinDale
|
feb5840f2a
feat: async tokenization (#374)
|
9 mesi fa |
AlpinDale
|
29c241c115
fix: explicitly disallow installation on non-linux platforms (#373)
|
9 mesi fa |
AlpinDale
|
97a2b26c97
fix: assertion error when use_sliding_window is present
|
9 mesi fa |
AlpinDale
|
0f6d56b07f
feat: model executor refactor (#367)
|
9 mesi fa |