AlpinDale
|
c2bb886b2e
fix: reinit procedure in `ModelInputForGPUBuilder` (#675)
|
il y a 6 mois |
AlpinDale
|
bf88c8567e
feat: mamba model support (#674)
|
il y a 6 mois |
AlpinDale
|
8583aefed7
chore: mamba cache single buffer (#673)
|
il y a 6 mois |
AlpinDale
|
19ad952dd4
chore: better stream termination in async engine (#672)
|
il y a 6 mois |
AlpinDale
|
1394008421
chore: decouple `should_modify_greedy_probs_inplace (#671)
|
il y a 6 mois |
AlpinDale
|
2da6a3ec2b
feat: option to apply temperature scaling last (#670)
|
il y a 6 mois |
AlpinDale
|
e3a53712f2
fix: mlpspeculator with padded vocab (#669)
|
il y a 6 mois |
AlpinDale
|
e200775863
feat: enable using fp8 kv and prefix caching with chunked prefill (#668)
|
il y a 6 mois |
AlpinDale
|
ef40c05cd3
fix: minor adjustments to scheduler and block manager (#667)
|
il y a 6 mois |
AlpinDale
|
7df7b8ca53
optimization: reduce end-to-end overhead from python obj allocation (#666)
|
il y a 6 mois |
AlpinDale
|
ea78357d70
fix: deps with TPU dockerfile (#665)
|
il y a 6 mois |
AlpinDale
|
62111fab17
feat: allow serving encoder-decoder models in the API server (#664)
|
il y a 6 mois |
AlpinDale
|
3f49a55f82
feat: add INT8 W8A16 quant for TPU (#663)
|
il y a 6 mois |
AlpinDale
|
5dd0145414
chore: update the env.py script and the bug report template (#662)
|
il y a 6 mois |
AlpinDale
|
1927ce2be4
fix: `get_num_blocks_touched` logic (#661)
|
il y a 6 mois |
AlpinDale
|
ed9a6f97c1
fix: kill api server when pinging dead engine (#660)
|
il y a 6 mois |
AlpinDale
|
6d54f7687d
fix: lora with pipeline parallel (#659)
|
il y a 6 mois |
AlpinDale
|
3405782f24
fix: max_num_batched_tokens should not be limited for lora (#658)
|
il y a 6 mois |
AlpinDale
|
67ee885293
fix: flashinfer outputs (#657)
|
il y a 6 mois |
AlpinDale
|
0e5bb11503
fix: make `merge_async_iterators.is_cancelled()` optional (#656)
|
il y a 6 mois |
AlpinDale
|
3170c0d4c6
fix: GPTQ/AWQ on Colab (#655)
|
il y a 6 mois |
AlpinDale
|
83bcb9119a
fix: multiprocessing timeout (#654)
|
il y a 6 mois |
AlpinDale
|
1e119cbeb6
fix: input processor in internvl2 (#653)
|
il y a 6 mois |
AlpinDale
|
a2344d3617
fix: move zeromq rpc frontend to IPC instead of TCP (#652)
|
il y a 6 mois |
AlpinDale
|
f1e1d0bd3d
feat: introduce `BaseAphroditeParameter` (#646)
|
il y a 6 mois |
AlpinDale
|
47ac074937
fix: RSLoRA support (#647)
|
il y a 6 mois |
50h100a
|
b96ba9930e
Merge pull request #644 from 50h100a/quadfix
|
il y a 6 mois |
AlpinDale
|
59264d32e9
fix: hardcoded float16 in embedding mode check (#645)
|
il y a 6 mois |
50h100a
|
cbdf2d986f
quadratic sampling: separate diff from logits to avoid NaNs.
|
il y a 6 mois |
AlpinDale
|
31f82da8bd
chore: deduplicate nvlink check to cuda platform (#643)
|
il y a 6 mois |