AlpinDale
|
4ed1bb9958
chore: add fault tolerance for RayTokenizerGroupPool
|
7 months ago |
AlpinDale
|
622de63c03
fix: remove useless code from cpu worker
|
7 months ago |
AlpinDale
|
80ac1cdc8f
fix: add args for the draft tp
|
7 months ago |
AlpinDale
|
abbb730607
feat: support draft model on different tensor parallel size
|
7 months ago |
AlpinDale
|
5974495461
chore: phi3v resize for dynamic shape
|
7 months ago |
AlpinDale
|
e238abf0cc
chore: send and recv helper functions
|
7 months ago |
AlpinDale
|
051daa0435
fix: add cutlass2x fallback kernels
|
7 months ago |
AlpinDale
|
3389dcdde5
fix: why the hell was this not committed?
|
7 months ago |
AlpinDale
|
25feb1d592
chore: add support for pinning lora adapters in the lru cache
|
7 months ago |
AlpinDale
|
0c662bc813
fix: exclude modelscope 1.15.0
|
7 months ago |
AlpinDale
|
b3d2b639d2
feat: add `gelu_quick` CPU kernel
|
7 months ago |
AlpinDale
|
1b340083b1
feat: add shm broadcast
|
7 months ago |
AlpinDale
|
8a6e83b52e
feat: fully sharded QKVParallelLinearWithLora support
|
7 months ago |
AlpinDale
|
4f42985b5c
feat: qwen2 lora shapes
|
7 months ago |
AlpinDale
|
af43576da0
feat: add MLPSpeculator speculative decoding support (#572)
|
7 months ago |
AlpinDale
|
ead08e9711
fix: missing next_pow_2 header function
|
7 months ago |
AlpinDale
|
7a3e38f79c
fix: cutlass kernel compilation
|
7 months ago |
AlpinDale
|
017b42c517
chore: use fork as the default method for mp backend
|
7 months ago |
AlpinDale
|
cd9ed8623b
fix: cuda version check for fp8 support in the cutlass kernels
|
7 months ago |
AlpinDale
|
fad77538de
feat: update cutlass int8 kernel configs for sm90
|
7 months ago |
AlpinDale
|
b753ff7870
feat: per-channel support for static activation quant
|
7 months ago |
AlpinDale
|
3c7444c89b
fix: asyncio.run hangs in python < 3.12
|
7 months ago |
AlpinDale
|
d44ac8e497
fix: `--preemption_mode` -> `--preemption-mode`
|
7 months ago |
AlpinDale
|
bcf9c83e6a
fix: incorrect args passed to generate() method in phi3v example
|
7 months ago |
AlpinDale
|
025322ee5f
fix: fp8 kv cache for qwen2 models
|
7 months ago |
AlpinDale
|
323fe23b21
chore: use 127.0.0.1 for single-node setups
|
7 months ago |
AlpinDale
|
89be49d058
fix: build for mi300x
|
7 months ago |
AlpinDale
|
7d3da17e19
fix: phi3 rope scaling
|
7 months ago |
AlpinDale
|
765adcfba1
chore: add w8a8 benchmark scripts
|
7 months ago |
AlpinDale
|
1587fab5de
fix: cuda version check for mma warning suppression
|
7 months ago |