AlpinDale
|
0e5cf7f840
tpu: avoid dynamo guard eval overhead (#949)
|
1 month ago |
AlpinDale
|
bf4a4d8516
fix: do not register punica with torch if using older torch (#948)
|
1 month ago |
AlpinDale
|
a90d41d908
tests: add kernel tests for causal_conv1d and mamba_ssm (#947)
|
1 month ago |
AlpinDale
|
fcfcfc65e1
quants: add triton kernels for AWQ (#946)
|
1 month ago |
AlpinDale
|
a62f0925fe
update flashinfer test (#945)
|
1 month ago |
AlpinDale
|
4ddc14d653
core: use flashinfer for FP8 KV when available (#944)
|
1 month ago |
khanonnie
|
e1eb7fbedc
fix: SentencePieceTokenizer error when using mistral tokenizer mode (#943)
|
1 month ago |
AlpinDale
|
689ed70f4e
vlm: fix persimmon and fuyu issues with transformers 4.45 (#942)
|
1 month ago |
AlpinDale
|
09324ea2ea
vlm: fix incompatibility nested tensors and multi-image llava-next (#941)
|
1 month ago |
AlpinDale
|
c5c09720b0
api: log prompt truncation (#940)
|
1 month ago |
AlpinDale
|
0e2bfccda0
core: add virtual engine for async outproc (#939)
|
1 month ago |
AlpinDale
|
f1ea7711bd
core: do not compile ScalarType for torch < 2.4.0 (#938)
|
1 month ago |
AlpinDale
|
8b42b58228
vlm: stack multimodal tensors to represent multiple images within each prompt (#937)
|
1 month ago |
AlpinDale
|
c50309d386
model: add support for paligemma2 (#936)
|
1 month ago |
AlpinDale
|
03bd85c950
chore: multi-image support for llava-next (#935)
|
1 month ago |
AlpinDale
|
9f3e7c86e2
feat: add fused Marlin MoE kernel (#934)
|
1 month ago |
AlpinDale
|
9b76e7f39b
fix: phi3v image_idx in async server (#933)
|
1 month ago |
AlpinDale
|
15cb8d5c26
xpu: support pipeline parallel (#932)
|
1 month ago |
AlpinDale
|
436d8fa0f1
core: do not compile for profiling (#931)
|
1 month ago |
AlpinDale
|
a3c03db735
fix: inline model loading conflicts with lora (#930)
|
1 month ago |
AlpinDale
|
59d1d59028
api: support aphrodite_config.yaml with inline loading (#929)
|
1 month ago |
AlpinDale
|
d46e70ac98
api: add inline model loading (#928)
|
1 month ago |
AlpinDale
|
8d9f1fd4e6
feat: add single user mode (#927)
|
1 month ago |
AlpinDale
|
53d0ba7c7c
api: add endpoint for loading and unloading the model (#926)
|
1 month ago |
AlpinDale
|
f7f3fed265
feat: add async postprocessor (#925)
|
1 month ago |
AlpinDale
|
5cb2e998d8
quants: update compressed tensors lifecycle to remove `prefix` from `create_weights` (#924)
|
1 month ago |
AlpinDale
|
0c6d90dade
neuron: add support for tensor parallelism (#923)
|
1 month ago |
AlpinDale
|
2940da2c7b
distributed: fix custom allreduce p2p cache file generation (#922)
|
1 month ago |
AlpinDale
|
5d9021969c
quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)
|
1 month ago |
AlpinDale
|
9c9b2dd843
core: improve warmup times for prefix caching in block manager v2 (#920)
|
1 month ago |