AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
8 月之前 |
AlpinDale
|
eaa06fdd14
fix some f-strings
|
8 月之前 |
AlpinDale
|
c58589318f
remove the graph mode func
|
8 月之前 |
AlpinDale
|
072b30fb42
measure end time within the cuda memory profiler
|
8 月之前 |
AlpinDale
|
7bcff4ac03
implement sharded state dict
|
8 月之前 |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
8 月之前 |
AlpinDale
|
01190e5049
use flash attention for the decoding phase
|
8 月之前 |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
8 月之前 |
AlpinDale
|
b984fe4a91
refactor custom allreduce to support multiple tp groups
|
8 月之前 |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
8 月之前 |
AlpinDale
|
8ae2cce237
refactor pynccl
|
8 月之前 |
AlpinDale
|
0e062e66d3
set block size at init
|
8 月之前 |
AlpinDale
|
b55381df0e
speedup lora loading times by resuing the cpu dummy lora
|
8 月之前 |
AlpinDale
|
3a0d1c7705
add get_name method to attention backends
|
8 月之前 |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
8 月之前 |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
8 月之前 |
AlpinDale
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
8 月之前 |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 月之前 |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 月之前 |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
11 月之前 |
AlpinDale
|
9181fa0396
feat: Triton kernels for sampling (#383)
|
11 月之前 |
AlpinDale
|
4b99ac15b7
fix: do not deepcopy metadata
|
11 月之前 |
AlpinDale
|
17b034613d
chore: make metadata a dataclass (#377)
|
11 月之前 |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
11 月之前 |
50h100a
|
b9e0ae87c5
fix fine-grained seeding.
|
1 年之前 |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 年之前 |
sgsdxzy
|
50c0875c32
chore: log total memory usage (#316)
|
1 年之前 |
AlpinDale
|
c2d77b1822
chore: logging refactor (#302)
|
1 年之前 |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
1 年之前 |
AlpinDale
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
1 年之前 |