AlpinDale
|
0247bdcd27
inline model switching
|
1 week ago |
AlpinDale
|
3391557b64
take yaml config in model load endpoint
|
1 week ago |
AlpinDale
|
118bbfec5a
take more args in model load field
|
1 week ago |
AlpinDale
|
8a4fc7f761
add a simple model load endpoint
|
1 week ago |
AlpinDale
|
b94c8840b5
fix model unload endpoint
|
1 week ago |
AlpinDale
|
1d15cb1184
Merge branch 'main' into mqaphrodite
|
1 week ago |
AlpinDale
|
0b5588de5c
fix: add missing logit index increment in sampling metadata prep (#1059)
|
1 week ago |
AlpinDale
|
80a9973f4c
add `dead_error` property to engine client
|
1 week ago |
AlpinDale
|
3dc8016706
Merge branch 'main' into mqaphrodite
|
1 week ago |
AlpinDale
|
525edc1283
build: fix compilation for causal_conv1d_fwd kernel signature (#1057)
|
1 week ago |
AlpinDale
|
b47a39026d
feat: introduce MQAphroditeEngine
|
1 week ago |
AlpinDale
|
9bdf8d5bfa
mamba: enable continuous batching for mamba kernels (#1055)
|
1 week ago |
AlpinDale
|
11f49b5341
fix: granite logit scale in logit computation (#1054)
|
1 week ago |
AlpinDale
|
1264e0b5d8
api: add mistral function calling format to all models loaded with "mistral" format (#1053)
|
1 week ago |
AlpinDale
|
b3f9ab3b72
quant: add tensor parallel support for bitsandbytes (#1052)
|
1 week ago |
AlpinDale
|
a985143768
core: add cuda graph support for encoder-decoder models (#1051)
|
1 week ago |
AlpinDale
|
239a8cae25
torch.compile: register all-reduce operations as custom ops (#1050)
|
1 week ago |
AlpinDale
|
4593a3b306
chore: remove dead code from triton sampling kernels (#1049)
|
1 week ago |
AlpinDale
|
8976805f90
kernel: asymmetric AQ AZP quantization kernels (#1048)
|
1 week ago |
AlpinDale
|
638c08d9dc
fix: clean shutdown issues (#1047)
|
1 week ago |
AlpinDale
|
4b1b658855
tpu: implement multi-step scheduling (#1046)
|
1 week ago |
AlpinDale
|
960dee2f97
torch.compile: fix functionalization (#1045)
|
1 week ago |
AlpinDale
|
ce7b602f03
model: add support for MiniCPM-3 (#1044)
|
1 week ago |
AlpinDale
|
4a7cb8f232
rocm: add custom paged attention kernels for ROCm (#1043)
|
1 week ago |
AlpinDale
|
6951928522
xpu: bump IPEX to 2.3, support GQA (#1042)
|
1 week ago |
AlpinDale
|
9797d38b24
torch.compile: allow adding custom compile backends via plugins (#1041)
|
1 week ago |
AlpinDale
|
e3f5bae2cc
fix: skip loading extra bias for Qwen2-VL GPTQ (#1040)
|
1 week ago |
AlpinDale
|
18acf7eaa0
tests: map physical device indices for test utils
|
1 week ago |
AlpinDale
|
05be6085ec
core: factor out input preprocessing into a separate class (#1039)
|
1 week ago |
AlpinDale
|
fd07406a19
fix: grouped_topk return type (#1038)
|
1 week ago |