AlpinDale
|
a0f0160b79
spec decode: remove dead code from draft bonus tokens (#1101)
|
2 weeks ago |
AlpinDale
|
a5bfc2bc3d
VLM: add support for LLaVA-Onevision model (#1100)
|
2 weeks ago |
AlpinDale
|
d44da0332c
misc: rename `CudaMemoryProfiler` to `DeviceMemoryProfiler` (#1099)
|
2 weeks ago |
AlpinDale
|
7ce3174039
VLM: refactor blip models to support composite weight loading (#1098)
|
2 weeks ago |
AlpinDale
|
91d03c04d2
VLM: refactor composite weight loading logic (#1097)
|
2 weeks ago |
AlpinDale
|
b65449b5ad
moe: refactor DBRX experts to support FusedMoE (#1095)
|
2 weeks ago |
AlpinDale
|
ed63c079f7
Triton: remove atomic add op from awq triton (#1094)
|
2 weeks ago |
AlpinDale
|
651678d2df
VLM: use `SequenceData.from_token_counts` to create dummy data (#1093)
|
2 weeks ago |
AlpinDale
|
7fffa507ff
build: build flash attention kernels inside aphrodite (#1085)
|
2 weeks ago |
AlpinDale
|
3d5b97837f
ci: fix the tag for :latest docker
|
2 weeks ago |
AlpinDale
|
d96c363301
api: fix admin key being required for authentication (#1091)
|
2 weeks ago |
AlpinDale
|
1fac86c325
core: factor out common code in SequenceData (#1083)
|
1 month ago |
AlpinDale
|
ad1205b277
readme: update attributions (#1082)
|
1 month ago |
AlpinDale
|
193fcee016
chore: check for torch 2.4.0 when registering custom op (#1081)
|
1 month ago |
AlpinDale
|
86bf2cc4f3
core: rename `PromptInputs,inputs` -> `PromptType,prompt` (#1080)
|
1 month ago |
AlpinDale
|
766ea79b89
vlm: fix feature size calculation for llava-next models (#1079)
|
1 month ago |
AlpinDale
|
7b6501bd05
tests: refactor model tests (#1078)
|
1 month ago |
AlpinDale
|
f6df92bde0
fix: unexpected kwarg for the legacy API server (#1076)
|
1 month ago |
AlpinDale
|
814c850d89
fix: validate `n` in the sampling params (#1075)
|
1 month ago |
AlpinDale
|
6212072245
api: support LoRA lineage and base model metadata management (#1072)
|
1 month ago |
AlpinDale
|
d9d287a288
rocm: enable multi-step scheduling for rocm (#1071)
|
1 month ago |
AlpinDale
|
ec17b6c4d0
fix: Phi3.5 Mini and MoE LoRA inference (#1070)
|
1 month ago |
AlpinDale
|
acc0c727c8
vlm: add support for molmo vision model (#1069)
|
1 month ago |
AlpinDale
|
7dd001ec2d
build: guard against changes in cuda library name (#1068)
|
1 month ago |
AlpinDale
|
ca7028d5ca
sampler: simplify logits resort in _apply_top_k_top_p (#1067)
|
1 month ago |
AlpinDale
|
61aed092a5
rocm: add support for FP8 KV cache in the custom paged attention kkernels (#1066)
|
1 month ago |
AlpinDale
|
12b0059b47
api: enable MQAphroditeEngine for embedding models (#1065)
|
1 month ago |
AlpinDale
|
314fa7f7d9
fix: encoder-decoder models for beam search (#1064)
|
1 month ago |
AlpinDale
|
34cf9b74f0
api: non-zero exit code if MQ engine startup fails (#1063)
|
1 month ago |
AlpinDale
|
92cee435e2
rocm: add more quants, fix _scaled_mm call (#1062)
|
1 month ago |