david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	2573b36f6a feat: allow image embeddings for VLM input (#686)	4 months ago
AlpinDale	300f889554 chore: update flashinfer to v0.1.3 (#685)	4 months ago
AlpinDale	4ca9aaaf3c build: add empty device (#684)	4 months ago
AlpinDale	b03fa02397 refactor: base worker input refactor for multi-step (#683)	4 months ago
AlpinDale	8cfbe62a7c chore: bump lmfe to v0.10.6 and include triton for tpu and xpu dockerfiles (#682)	4 months ago
AlpinDale	06cd48ea5c chore: use mark_dynamic to reduce TPU compile times (#681)	4 months ago
AlpinDale	fa5553b20f fix: phi3v batch inference with different aspect ratio images (#680)	4 months ago
AlpinDale	79d603954e fix: chunked prefill with v2 block manager (#679)	4 months ago
AlpinDale	3bbb3f2086 feat: add numpy implementation of `compute_slot_mapping` (#678)	4 months ago
AlpinDale	df208ab4e9 fix: fp8 checkpoints with fused linear modules (#677)	4 months ago
AlpinDale	81fa31bcaf feat: embeddings support for batched OAI endpoint (#676)	4 months ago
AlpinDale	c2bb886b2e fix: reinit procedure in `ModelInputForGPUBuilder` (#675)	4 months ago
AlpinDale	bf88c8567e feat: mamba model support (#674)	4 months ago
AlpinDale	8583aefed7 chore: mamba cache single buffer (#673)	4 months ago
AlpinDale	19ad952dd4 chore: better stream termination in async engine (#672)	4 months ago
AlpinDale	1394008421 chore: decouple `should_modify_greedy_probs_inplace (#671)	4 months ago
AlpinDale	2da6a3ec2b feat: option to apply temperature scaling last (#670)	4 months ago
AlpinDale	e3a53712f2 fix: mlpspeculator with padded vocab (#669)	4 months ago
AlpinDale	e200775863 feat: enable using fp8 kv and prefix caching with chunked prefill (#668)	4 months ago
AlpinDale	ef40c05cd3 fix: minor adjustments to scheduler and block manager (#667)	4 months ago
AlpinDale	7df7b8ca53 optimization: reduce end-to-end overhead from python obj allocation (#666)	4 months ago
AlpinDale	ea78357d70 fix: deps with TPU dockerfile (#665)	4 months ago
AlpinDale	62111fab17 feat: allow serving encoder-decoder models in the API server (#664)	4 months ago
AlpinDale	3f49a55f82 feat: add INT8 W8A16 quant for TPU (#663)	4 months ago
AlpinDale	5dd0145414 chore: update the env.py script and the bug report template (#662)	4 months ago
AlpinDale	1927ce2be4 fix: `get_num_blocks_touched` logic (#661)	4 months ago
AlpinDale	ed9a6f97c1 fix: kill api server when pinging dead engine (#660)	4 months ago
AlpinDale	6d54f7687d fix: lora with pipeline parallel (#659)	4 months ago
AlpinDale	3405782f24 fix: max_num_batched_tokens should not be limited for lora (#658)	4 months ago
AlpinDale	67ee885293 fix: flashinfer outputs (#657)	4 months ago

Newer Older

Commit History Find

Commit History