david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	8b42b58228 vlm: stack multimodal tensors to represent multiple images within each prompt (#937)	2 weeks ago
AlpinDale	c50309d386 model: add support for paligemma2 (#936)	2 weeks ago
AlpinDale	03bd85c950 chore: multi-image support for llava-next (#935)	2 weeks ago
AlpinDale	9f3e7c86e2 feat: add fused Marlin MoE kernel (#934)	2 weeks ago
AlpinDale	9b76e7f39b fix: phi3v image_idx in async server (#933)	2 weeks ago
AlpinDale	15cb8d5c26 xpu: support pipeline parallel (#932)	2 weeks ago
AlpinDale	436d8fa0f1 core: do not compile for profiling (#931)	2 weeks ago
AlpinDale	a3c03db735 fix: inline model loading conflicts with lora (#930)	3 weeks ago
AlpinDale	59d1d59028 api: support aphrodite_config.yaml with inline loading (#929)	3 weeks ago
AlpinDale	d46e70ac98 api: add inline model loading (#928)	3 weeks ago
AlpinDale	8d9f1fd4e6 feat: add single user mode (#927)	3 weeks ago
AlpinDale	53d0ba7c7c api: add endpoint for loading and unloading the model (#926)	3 weeks ago
AlpinDale	f7f3fed265 feat: add async postprocessor (#925)	3 weeks ago
AlpinDale	5cb2e998d8 quants: update compressed tensors lifecycle to remove `prefix` from `create_weights` (#924)	3 weeks ago
AlpinDale	0c6d90dade neuron: add support for tensor parallelism (#923)	3 weeks ago
AlpinDale	2940da2c7b distributed: fix custom allreduce p2p cache file generation (#922)	3 weeks ago
AlpinDale	5d9021969c quants: update `qqq` and `gptq_marlin_24` to use AphroditeParameters (#921)	3 weeks ago
AlpinDale	9c9b2dd843 core: improve warmup times for prefix caching in block manager v2 (#920)	3 weeks ago
AlpinDale	0c162c8dad api: use fp32 for base64 embeddings (#919)	3 weeks ago
AlpinDale	3b684a8a54 spec decode: streamline batch expansion tensor manipulation (#918)	3 weeks ago
AlpinDale	fce970a846 feat: multi-image input support for Phi3V (#917)	3 weeks ago
AlpinDale	178c2141d4 fix: phi3v crash with unusual image sizes (#916)	3 weeks ago
AlpinDale	f61acdd3ec api: add json_schema to OpenAI server (#915)	3 weeks ago
AlpinDale	b1492c1529 core: add multi-step scheduling support for the synchronous engine (#914)	3 weeks ago
AlpinDale	799667737b quantization: update marlin to use `AphroditeParameters` (#913)	3 weeks ago
AlpinDale	16e5b2be8b fix: empty prompt crashing the server (#912)	3 weeks ago
AlpinDale	673621a3d2 xpu: refactor the model runner for tensor parallelism (#910)	3 weeks ago
AlpinDale	d69273bd2b ray: better error when placement group topology is incorrect (#906)	3 weeks ago
AlpinDale	6fbab320e7 api: error suppression cleanup + timeout suppression on aborts (#905)	3 weeks ago
AlpinDale	ab533e0e60 spec decode: fix logprobs when using speculative decoding (#904)	3 weeks ago

Newer Older

Commit History Find

Commit History