AlpinDale
|
84163654f4
tokenizer: allow skip_special_tokens=False for mistral tokenizer
|
1 week ago |
AlpinDale
|
525edc1283
build: fix compilation for causal_conv1d_fwd kernel signature (#1057)
|
1 week ago |
AlpinDale
|
9bdf8d5bfa
mamba: enable continuous batching for mamba kernels (#1055)
|
1 week ago |
AlpinDale
|
11f49b5341
fix: granite logit scale in logit computation (#1054)
|
1 week ago |
AlpinDale
|
1264e0b5d8
api: add mistral function calling format to all models loaded with "mistral" format (#1053)
|
1 week ago |
AlpinDale
|
b3f9ab3b72
quant: add tensor parallel support for bitsandbytes (#1052)
|
1 week ago |
AlpinDale
|
a985143768
core: add cuda graph support for encoder-decoder models (#1051)
|
1 week ago |
AlpinDale
|
239a8cae25
torch.compile: register all-reduce operations as custom ops (#1050)
|
1 week ago |
AlpinDale
|
4593a3b306
chore: remove dead code from triton sampling kernels (#1049)
|
1 week ago |
AlpinDale
|
8976805f90
kernel: asymmetric AQ AZP quantization kernels (#1048)
|
1 week ago |
AlpinDale
|
638c08d9dc
fix: clean shutdown issues (#1047)
|
1 week ago |
AlpinDale
|
4b1b658855
tpu: implement multi-step scheduling (#1046)
|
1 week ago |
AlpinDale
|
960dee2f97
torch.compile: fix functionalization (#1045)
|
1 week ago |
AlpinDale
|
ce7b602f03
model: add support for MiniCPM-3 (#1044)
|
1 week ago |
AlpinDale
|
4a7cb8f232
rocm: add custom paged attention kernels for ROCm (#1043)
|
1 week ago |
AlpinDale
|
6951928522
xpu: bump IPEX to 2.3, support GQA (#1042)
|
1 week ago |
AlpinDale
|
9797d38b24
torch.compile: allow adding custom compile backends via plugins (#1041)
|
1 week ago |
AlpinDale
|
e3f5bae2cc
fix: skip loading extra bias for Qwen2-VL GPTQ (#1040)
|
1 week ago |
AlpinDale
|
18acf7eaa0
tests: map physical device indices for test utils
|
1 week ago |
AlpinDale
|
05be6085ec
core: factor out input preprocessing into a separate class (#1039)
|
1 week ago |
AlpinDale
|
fd07406a19
fix: grouped_topk return type (#1038)
|
1 week ago |
AlpinDale
|
271879a4a5
fix: disable chunked prefill and prefix caching for multimodal models (#1037)
|
1 week ago |
AlpinDale
|
c951a54d21
fix: multi-step + flashinfer with cuda graphs (#1036)
|
1 week ago |
AlpinDale
|
055c8905a3
api: add sampling/engine option to return only deltas or final output (#1035)
|
1 week ago |
AlpinDale
|
1390915778
multi-step: add support for flashinfer attention backend (#1033)
|
1 week ago |
AlpinDale
|
a56bce4c94
fix: remove duplicate assignment in Hermes2ProToolParser
|
1 week ago |
AlpinDale
|
c6e8cb058b
fix: lazy init _copy_stream (#1032)
|
1 week ago |
AlpinDale
|
3d72d8212a
chore: remove accidental commit
|
1 week ago |
AlpinDale
|
8d5d87e687
vlm: support multiple images for qwen-vl (#1031)
|
1 week ago |
AlpinDale
|
41ceb754a6
vlm: fix internvl2 inference with various num_patches (#1030)
|
1 week ago |