david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	3f49a55f82 feat: add INT8 W8A16 quant for TPU (#663)	4 months ago
AlpinDale	5dd0145414 chore: update the env.py script and the bug report template (#662)	4 months ago
AlpinDale	1927ce2be4 fix: `get_num_blocks_touched` logic (#661)	4 months ago
AlpinDale	ed9a6f97c1 fix: kill api server when pinging dead engine (#660)	4 months ago
AlpinDale	6d54f7687d fix: lora with pipeline parallel (#659)	4 months ago
AlpinDale	3405782f24 fix: max_num_batched_tokens should not be limited for lora (#658)	4 months ago
AlpinDale	67ee885293 fix: flashinfer outputs (#657)	4 months ago
AlpinDale	0e5bb11503 fix: make `merge_async_iterators.is_cancelled()` optional (#656)	4 months ago
AlpinDale	3170c0d4c6 fix: GPTQ/AWQ on Colab (#655)	4 months ago
AlpinDale	83bcb9119a fix: multiprocessing timeout (#654)	4 months ago
AlpinDale	1e119cbeb6 fix: input processor in internvl2 (#653)	4 months ago
AlpinDale	a2344d3617 fix: move zeromq rpc frontend to IPC instead of TCP (#652)	4 months ago
AlpinDale	f1e1d0bd3d feat: introduce `BaseAphroditeParameter` (#646)	4 months ago
AlpinDale	47ac074937 fix: RSLoRA support (#647)	4 months ago
50h100a	b96ba9930e Merge pull request #644 from 50h100a/quadfix	4 months ago
AlpinDale	59264d32e9 fix: hardcoded float16 in embedding mode check (#645)	4 months ago
50h100a	cbdf2d986f quadratic sampling: separate diff from logits to avoid NaNs.	4 months ago
AlpinDale	31f82da8bd chore: deduplicate nvlink check to cuda platform (#643)	4 months ago
AlpinDale	3648170750 fix: gracefully handle missing chat template (#642)	4 months ago
AlpinDale	77c4fbd5c9 fix: better async request cancellation (#641)	4 months ago
AlpinDale	a03e0e2ea4 ci: exclude cu118 and cu121 from build and add py_limited_api (#639)	4 months ago
AlpinDale	db81a67c54 bump to v0.6.0.post1 (#635)	4 months ago
AlpinDale	c147670c13 fix: clean up incorrect log in worker (#636)	4 months ago
AlpinDale	308501daa5 fix: default api port and attention selector (#634)	4 months ago
AlpinDale	a0e446a17d feat: initial encoder-decoder support with BART model (#633)	4 months ago
AlpinDale	337071f484 chore: optimize evictor v2 performance (#631)	4 months ago
AlpinDale	a401f8e05d feat: per-tensor token epilogue kernels (#630)	4 months ago
AlpinDale	09b82f9963 feat: Add support for GPU device selection in SpecDecodeBaseSampler (#629)	4 months ago
AlpinDale	f5cca12da8 feat: multi-image input for minicpmv (#628)	4 months ago
Trapper4888	ba848b00f3 readme: fix model name typo (#627)	4 months ago

Newer Older

Commit History Find

Commit History