david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	34b41e0a87 chore: add coordinator to reduce code duplication in tp and pp	7 months ago
AlpinDale	d0cca80b8b feat: support sharded tensorizer models	7 months ago
AlpinDale	4d1e613804 chore: minor simplifications	7 months ago
AlpinDale	6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer	7 months ago
AlpinDale	c975bba905 fix: sharded state loader with lora	7 months ago
AlpinDale	e321d80e4e fix: `prompt_logprobs==0` case	7 months ago
AlpinDale	8d77c69cbd feat: support image processor and add llava example	7 months ago
AlpinDale	08f639b8aa remove duplicate seq_lens_tensor	7 months ago
AlpinDale	f40b809d3b allow using v2 block manager with sliding window	7 months ago
AlpinDale	5b0c11d190 support pipeline parallel pynccl groups	7 months ago
AlpinDale	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 months ago
AlpinDale	656459fd84 make fp8_e4m3 work on nvidia	7 months ago
AlpinDale	0aaf2dfc6b improve parallel logging	7 months ago
AlpinDale	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
AlpinDale	eaa06fdd14 fix some f-strings	7 months ago
AlpinDale	c58589318f remove the graph mode func	7 months ago
AlpinDale	072b30fb42 measure end time within the cuda memory profiler	7 months ago
AlpinDale	7bcff4ac03 implement sharded state dict	7 months ago
AlpinDale	a94de94c44 refactor: combine the prefill and decode into a single API (#553)	7 months ago
AlpinDale	01190e5049 use flash attention for the decoding phase	8 months ago
AlpinDale	50b7c13db0 refactor: attention selector (#552)	8 months ago
AlpinDale	b984fe4a91 refactor custom allreduce to support multiple tp groups	8 months ago
AlpinDale	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	8 months ago
AlpinDale	8ae2cce237 refactor pynccl	8 months ago
AlpinDale	0e062e66d3 set block size at init	8 months ago
AlpinDale	b55381df0e speedup lora loading times by resuing the cpu dummy lora	8 months ago
AlpinDale	3a0d1c7705 add get_name method to attention backends	8 months ago
AlpinDale	2351a0e2cd feat: FlashInfer backend for decoding phase (#548)	8 months ago
AlpinDale	35ae01d7ba refactor: attention metadata term	8 months ago
AlpinDale	aed64884c6 feat: prompt logprobs with chunked prefill (#539)	8 months ago

Newer Older

Commit History Find

Commit History