AlpinDale
|
4b1b658855
tpu: implement multi-step scheduling (#1046)
|
1 week ago |
AlpinDale
|
960dee2f97
torch.compile: fix functionalization (#1045)
|
1 week ago |
AlpinDale
|
ce7b602f03
model: add support for MiniCPM-3 (#1044)
|
1 week ago |
AlpinDale
|
4a7cb8f232
rocm: add custom paged attention kernels for ROCm (#1043)
|
1 week ago |
AlpinDale
|
6951928522
xpu: bump IPEX to 2.3, support GQA (#1042)
|
1 week ago |
AlpinDale
|
9797d38b24
torch.compile: allow adding custom compile backends via plugins (#1041)
|
1 week ago |
AlpinDale
|
e3f5bae2cc
fix: skip loading extra bias for Qwen2-VL GPTQ (#1040)
|
1 week ago |
AlpinDale
|
18acf7eaa0
tests: map physical device indices for test utils
|
1 week ago |
AlpinDale
|
05be6085ec
core: factor out input preprocessing into a separate class (#1039)
|
1 week ago |
AlpinDale
|
fd07406a19
fix: grouped_topk return type (#1038)
|
1 week ago |
AlpinDale
|
271879a4a5
fix: disable chunked prefill and prefix caching for multimodal models (#1037)
|
1 week ago |
AlpinDale
|
c951a54d21
fix: multi-step + flashinfer with cuda graphs (#1036)
|
1 week ago |
AlpinDale
|
055c8905a3
api: add sampling/engine option to return only deltas or final output (#1035)
|
1 week ago |
AlpinDale
|
1390915778
multi-step: add support for flashinfer attention backend (#1033)
|
1 week ago |
AlpinDale
|
a56bce4c94
fix: remove duplicate assignment in Hermes2ProToolParser
|
1 week ago |
AlpinDale
|
c6e8cb058b
fix: lazy init _copy_stream (#1032)
|
1 week ago |
AlpinDale
|
3d72d8212a
chore: remove accidental commit
|
1 week ago |
AlpinDale
|
8d5d87e687
vlm: support multiple images for qwen-vl (#1031)
|
1 week ago |
AlpinDale
|
41ceb754a6
vlm: fix internvl2 inference with various num_patches (#1030)
|
1 week ago |
AlpinDale
|
6f59024522
torch.compile: hide slicing under custom op for inductor (#1029)
|
1 week ago |
AlpinDale
|
d51720114b
chore: use RoPE cache for MRoPE method (#1028)
|
1 week ago |
AlpinDale
|
65a59bbb6b
cpu: raise error if using encoder-decoder models (#1027)
|
1 week ago |
AlpinDale
|
b33cf04386
quants: add bitsandbytes support for gemma2 model (#1026)
|
1 week ago |
AlpinDale
|
7d5feaa037
api: fix logic for deciding if tool parser is used (#1025)
|
1 week ago |
AlpinDale
|
ddaefd8d38
chore: remove engine_use_ray (#1024)
|
1 week ago |
AlpinDale
|
304e1e5a8a
core: dump model runner inputs during crash (#1023)
|
1 week ago |
AlpinDale
|
1721bea53a
vlm: add support for Pixtral model (#1022)
|
1 week ago |
AlpinDale
|
0859dc3bc0
tests: refactor speculative decoding tests to remove the async engine (#1021)
|
1 week ago |
AlpinDale
|
fe01e2ded8
chore: move `device` keys to a constant (#1020)
|
1 week ago |
AlpinDale
|
a113309876
kernel: add meta functions for ops to prevent graph breaks (#1019)
|
1 week ago |