AlpinDale
|
7ca63930c8
support deepseek_v3 model
|
1 week ago |
AlpinDale
|
cec4da1dab
quants: support w8a8 fp8 block-wise quantization from DS3
|
1 week ago |
AlpinDale
|
1390915778
multi-step: add support for flashinfer attention backend (#1033)
|
1 week ago |
AlpinDale
|
a56bce4c94
fix: remove duplicate assignment in Hermes2ProToolParser
|
1 week ago |
AlpinDale
|
c6e8cb058b
fix: lazy init _copy_stream (#1032)
|
1 week ago |
AlpinDale
|
3d72d8212a
chore: remove accidental commit
|
1 week ago |
AlpinDale
|
8d5d87e687
vlm: support multiple images for qwen-vl (#1031)
|
1 week ago |
AlpinDale
|
41ceb754a6
vlm: fix internvl2 inference with various num_patches (#1030)
|
1 week ago |
AlpinDale
|
6f59024522
torch.compile: hide slicing under custom op for inductor (#1029)
|
1 week ago |
AlpinDale
|
d51720114b
chore: use RoPE cache for MRoPE method (#1028)
|
1 week ago |
AlpinDale
|
65a59bbb6b
cpu: raise error if using encoder-decoder models (#1027)
|
1 week ago |
AlpinDale
|
b33cf04386
quants: add bitsandbytes support for gemma2 model (#1026)
|
1 week ago |
AlpinDale
|
7d5feaa037
api: fix logic for deciding if tool parser is used (#1025)
|
1 week ago |
AlpinDale
|
ddaefd8d38
chore: remove engine_use_ray (#1024)
|
1 week ago |
AlpinDale
|
304e1e5a8a
core: dump model runner inputs during crash (#1023)
|
1 week ago |
AlpinDale
|
1721bea53a
vlm: add support for Pixtral model (#1022)
|
1 week ago |
AlpinDale
|
0859dc3bc0
tests: refactor speculative decoding tests to remove the async engine (#1021)
|
1 week ago |
AlpinDale
|
fe01e2ded8
chore: move `device` keys to a constant (#1020)
|
1 week ago |
AlpinDale
|
a113309876
kernel: add meta functions for ops to prevent graph breaks (#1019)
|
1 week ago |
AlpinDale
|
f2b6dc3872
cpu: add support for W8A8 quantization via compressed-tensor (#1017)
|
1 week ago |
AlpinDale
|
2261a0e8dd
cpu: fix issue with sampling kernels (#1016)
|
1 week ago |
AlpinDale
|
411ac4f405
vlm: add support for Qwen2-VL model (#1015)
|
1 week ago |
AlpinDale
|
be59e30139
vlm: add support for video modality + llava next video (#1014)
|
1 week ago |
AlpinDale
|
dcb36de9c4
quants: add support for NVIDIA's ModelOpt checkpoints (#1013)
|
1 week ago |
AlpinDale
|
a59a5f64d2
fix: internvl pipeline parallel (#1012)
|
2 weeks ago |
AlpinDale
|
5224389dae
chore: skip loading extra bias for qwen2 moe GPTQ (#1011)
|
2 weeks ago |
AlpinDale
|
51d24fc7c0
build: shallow clone cutlass 3.5.1 tag (#1010)
|
2 weeks ago |
AlpinDale
|
4737c22ab3
fix: pass `APHRODITE_ATTENTION_BACKEND` to ray workers (#1009)
|
2 weeks ago |
AlpinDale
|
de341ffb00
fix: ensure multistep lookahead allocation is compatible with cugraph max capture (#1008)
|
2 weeks ago |
AlpinDale
|
9a42869055
chore: keep chunked prefill enabled with prefix caching (#1007)
|
2 weeks ago |