AlpinDale
|
2050b42f3f
fix: remove unused code in sampler
|
4 mesiacov pred |
AlpinDale
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
AlpinDale
|
d8a51d05a7
fix: seeded gens with pipeline parallel
|
4 mesiacov pred |
AlpinDale
|
4abbbdad78
chore: make triton fully optional
|
4 mesiacov pred |
AlpinDale
|
ff84ebbb04
chore: use array to speedup padding
|
4 mesiacov pred |
AlpinDale
|
dd18c5042c
move prepare_inputs to the GPU (#596)
|
4 mesiacov pred |
AlpinDale
|
ebf8a53618
feat: optimize throughput to 1.4x by using numpy for token padding
|
4 mesiacov pred |
AlpinDale
|
63b735bc2a
chore: optimize v2 block manager to match the performance of v1
|
4 mesiacov pred |
AlpinDale
|
d1f91d0f70
fix: greedy sampling being not greedy in concurrent situations where penalties are used
|
5 mesiacov pred |
AlpinDale
|
b9a5a0ae79
fix: avoid copying prompt/output tokens if penalties arent used
|
5 mesiacov pred |
AlpinDale
|
e321d80e4e
fix: `prompt_logprobs==0` case
|
5 mesiacov pred |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
5 mesiacov pred |
AlpinDale
|
9ce319b03c
fix: sampler indexing issues in distributed environments (#546)
|
5 mesiacov pred |
AlpinDale
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
5 mesiacov pred |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 mesiacov pred |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 mesiacov pred |
50h100a
|
0634b8a3a6
fix memory pinning conditional
|
9 mesiacov pred |
50h100a
|
d3dd170a7d
merge main
|
9 mesiacov pred |
50h100a
|
dc09dc2b4d
Merge branch 'main' into pr_samplers
|
9 mesiacov pred |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 mesiacov pred |
50h100a
|
7ed57e318d
Overhauled SamplingTensors construction.
|
9 mesiacov pred |
AlpinDale
|
9fa99215f8
feat: add cubic sampling (#280)
|
10 mesiacov pred |
AlpinDale
|
657aec0cbd
refactor: OpenAI endpoint (#261)
|
10 mesiacov pred |
AlpinDale
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
10 mesiacov pred |
AlpinDale
|
1c46fa31ad
feat: add quadratic sampling (#233)
|
11 mesiacov pred |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
11 mesiacov pred |
AlpinDale
|
a39eeb7188
fix: logprobs for dynatemp (#215)
|
11 mesiacov pred |
Stefan Gligorijevic
|
56446a04bb
feat: dynamic temperature (#209)
|
11 mesiacov pred |
AlpinDale
|
1394eab8ab
fix temperature being set to 1 in all cases (#210)
|
11 mesiacov pred |
AlpinDale
|
d54791aaa8
feat: reduce sampler overhead by making it less blocking (#198)
|
1 rok pred |