AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 月之前 |
AlpinDale
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 月之前 |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 月之前 |
AlpinDale
|
0aaf2dfc6b
improve parallel logging
|
7 月之前 |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
7 月之前 |
AlpinDale
|
eaa06fdd14
fix some f-strings
|
7 月之前 |
AlpinDale
|
c58589318f
remove the graph mode func
|
7 月之前 |
AlpinDale
|
072b30fb42
measure end time within the cuda memory profiler
|
7 月之前 |
AlpinDale
|
7bcff4ac03
implement sharded state dict
|
7 月之前 |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
7 月之前 |
AlpinDale
|
01190e5049
use flash attention for the decoding phase
|
8 月之前 |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
8 月之前 |
AlpinDale
|
b984fe4a91
refactor custom allreduce to support multiple tp groups
|
8 月之前 |
AlpinDale
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
8 月之前 |
AlpinDale
|
8ae2cce237
refactor pynccl
|
8 月之前 |
AlpinDale
|
0e062e66d3
set block size at init
|
8 月之前 |
AlpinDale
|
b55381df0e
speedup lora loading times by resuing the cpu dummy lora
|
8 月之前 |
AlpinDale
|
3a0d1c7705
add get_name method to attention backends
|
8 月之前 |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
8 月之前 |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
8 月之前 |
AlpinDale
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
8 月之前 |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 月之前 |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 月之前 |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
11 月之前 |
AlpinDale
|
9181fa0396
feat: Triton kernels for sampling (#383)
|
11 月之前 |
AlpinDale
|
4b99ac15b7
fix: do not deepcopy metadata
|
11 月之前 |
AlpinDale
|
17b034613d
chore: make metadata a dataclass (#377)
|
11 月之前 |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
11 月之前 |
50h100a
|
b9e0ae87c5
fix fine-grained seeding.
|
1 年之前 |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 年之前 |