AlpinDale
|
3f49a55f82
feat: add INT8 W8A16 quant for TPU (#663)
|
4 months ago |
AlpinDale
|
5dd0145414
chore: update the env.py script and the bug report template (#662)
|
4 months ago |
AlpinDale
|
1927ce2be4
fix: `get_num_blocks_touched` logic (#661)
|
4 months ago |
AlpinDale
|
ed9a6f97c1
fix: kill api server when pinging dead engine (#660)
|
4 months ago |
AlpinDale
|
6d54f7687d
fix: lora with pipeline parallel (#659)
|
4 months ago |
AlpinDale
|
3405782f24
fix: max_num_batched_tokens should not be limited for lora (#658)
|
4 months ago |
AlpinDale
|
67ee885293
fix: flashinfer outputs (#657)
|
4 months ago |
AlpinDale
|
0e5bb11503
fix: make `merge_async_iterators.is_cancelled()` optional (#656)
|
4 months ago |
AlpinDale
|
3170c0d4c6
fix: GPTQ/AWQ on Colab (#655)
|
4 months ago |
AlpinDale
|
83bcb9119a
fix: multiprocessing timeout (#654)
|
4 months ago |
AlpinDale
|
1e119cbeb6
fix: input processor in internvl2 (#653)
|
4 months ago |
AlpinDale
|
a2344d3617
fix: move zeromq rpc frontend to IPC instead of TCP (#652)
|
4 months ago |
AlpinDale
|
f1e1d0bd3d
feat: introduce `BaseAphroditeParameter` (#646)
|
4 months ago |
AlpinDale
|
47ac074937
fix: RSLoRA support (#647)
|
4 months ago |
50h100a
|
b96ba9930e
Merge pull request #644 from 50h100a/quadfix
|
4 months ago |
AlpinDale
|
59264d32e9
fix: hardcoded float16 in embedding mode check (#645)
|
4 months ago |
50h100a
|
cbdf2d986f
quadratic sampling: separate diff from logits to avoid NaNs.
|
4 months ago |
AlpinDale
|
31f82da8bd
chore: deduplicate nvlink check to cuda platform (#643)
|
4 months ago |
AlpinDale
|
3648170750
fix: gracefully handle missing chat template (#642)
|
4 months ago |
AlpinDale
|
77c4fbd5c9
fix: better async request cancellation (#641)
|
4 months ago |
AlpinDale
|
a03e0e2ea4
ci: exclude cu118 and cu121 from build and add py_limited_api (#639)
|
4 months ago |
AlpinDale
|
db81a67c54
bump to v0.6.0.post1 (#635)
|
4 months ago |
AlpinDale
|
c147670c13
fix: clean up incorrect log in worker (#636)
|
4 months ago |
AlpinDale
|
308501daa5
fix: default api port and attention selector (#634)
|
4 months ago |
AlpinDale
|
a0e446a17d
feat: initial encoder-decoder support with BART model (#633)
|
4 months ago |
AlpinDale
|
337071f484
chore: optimize evictor v2 performance (#631)
|
4 months ago |
AlpinDale
|
a401f8e05d
feat: per-tensor token epilogue kernels (#630)
|
4 months ago |
AlpinDale
|
09b82f9963
feat: Add support for GPU device selection in SpecDecodeBaseSampler (#629)
|
4 months ago |
AlpinDale
|
f5cca12da8
feat: multi-image input for minicpmv (#628)
|
4 months ago |
Trapper4888
|
ba848b00f3
readme: fix model name typo (#627)
|
4 months ago |