david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	0e5cf7f840 tpu: avoid dynamo guard eval overhead (#949)	1 month ago
AlpinDale	bf4a4d8516 fix: do not register punica with torch if using older torch (#948)	1 month ago
AlpinDale	a90d41d908 tests: add kernel tests for causal_conv1d and mamba_ssm (#947)	1 month ago
AlpinDale	fcfcfc65e1 quants: add triton kernels for AWQ (#946)	1 month ago
AlpinDale	a62f0925fe update flashinfer test (#945)	1 month ago
AlpinDale	4ddc14d653 core: use flashinfer for FP8 KV when available (#944)	1 month ago
khanonnie	e1eb7fbedc fix: SentencePieceTokenizer error when using mistral tokenizer mode (#943)	1 month ago
AlpinDale	689ed70f4e vlm: fix persimmon and fuyu issues with transformers 4.45 (#942)	1 month ago
AlpinDale	09324ea2ea vlm: fix incompatibility nested tensors and multi-image llava-next (#941)	1 month ago
AlpinDale	c5c09720b0 api: log prompt truncation (#940)	1 month ago
AlpinDale	0e2bfccda0 core: add virtual engine for async outproc (#939)	1 month ago
AlpinDale	f1ea7711bd core: do not compile ScalarType for torch < 2.4.0 (#938)	1 month ago
AlpinDale	8b42b58228 vlm: stack multimodal tensors to represent multiple images within each prompt (#937)	1 month ago
AlpinDale	c50309d386 model: add support for paligemma2 (#936)	1 month ago
AlpinDale	03bd85c950 chore: multi-image support for llava-next (#935)	1 month ago
AlpinDale	9f3e7c86e2 feat: add fused Marlin MoE kernel (#934)	1 month ago
AlpinDale	9b76e7f39b fix: phi3v image_idx in async server (#933)	1 month ago
AlpinDale	15cb8d5c26 xpu: support pipeline parallel (#932)	1 month ago
AlpinDale	436d8fa0f1 core: do not compile for profiling (#931)	1 month ago
AlpinDale	a3c03db735 fix: inline model loading conflicts with lora (#930)	1 month ago
AlpinDale	59d1d59028 api: support aphrodite_config.yaml with inline loading (#929)	1 month ago
AlpinDale	d46e70ac98 api: add inline model loading (#928)	1 month ago
AlpinDale	8d9f1fd4e6 feat: add single user mode (#927)	1 month ago
AlpinDale	53d0ba7c7c api: add endpoint for loading and unloading the model (#926)	1 month ago
AlpinDale	f7f3fed265 feat: add async postprocessor (#925)	1 month ago
AlpinDale	5cb2e998d8 quants: update compressed tensors lifecycle to remove `prefix` from `create_weights` (#924)	1 month ago
AlpinDale	0c6d90dade neuron: add support for tensor parallelism (#923)	1 month ago
AlpinDale	2940da2c7b distributed: fix custom allreduce p2p cache file generation (#922)	1 month ago
AlpinDale	5d9021969c quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)	1 month ago
AlpinDale	9c9b2dd843 core: improve warmup times for prefix caching in block manager v2 (#920)	1 month ago

Newer Older

Commit History Find

Commit History