david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	4599c98f99 feat: dynamic image size support for VLMs	7 months ago
AlpinDale	5be90c3859 Mamba infrastrucuture support (#586)	7 months ago
AlpinDale	ae04f57ec1 feat: Pipeline Parallel support (#581)	7 months ago
AlpinDale	3a0fdf7b9b chore: remove `image_input_type` from VLM config	7 months ago
AlpinDale	b6e60143e7 Flashinfer for prefill phase (#580)	7 months ago
AlpinDale	cdff8e89f9 feat: introduce `DraftModelRunner`	7 months ago
AlpinDale	c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support	7 months ago
AlpinDale	56e0b8223c chore: add base class for LoRA-supported models	7 months ago
AlpinDale	dead030abf fix: cuda graph with MLPSpeculator	7 months ago
AlpinDale	405bb74612 Control plane comms refactor (#573)	7 months ago
AlpinDale	25feb1d592 chore: add support for pinning lora adapters in the lru cache	7 months ago
AlpinDale	af43576da0 feat: add MLPSpeculator speculative decoding support (#572)	7 months ago
AlpinDale	34b41e0a87 chore: add coordinator to reduce code duplication in tp and pp	7 months ago
AlpinDale	d0cca80b8b feat: support sharded tensorizer models	7 months ago
AlpinDale	4d1e613804 chore: minor simplifications	7 months ago
AlpinDale	6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer	7 months ago
AlpinDale	c975bba905 fix: sharded state loader with lora	7 months ago
AlpinDale	e321d80e4e fix: `prompt_logprobs==0` case	7 months ago
AlpinDale	8d77c69cbd feat: support image processor and add llava example	7 months ago
AlpinDale	08f639b8aa remove duplicate seq_lens_tensor	7 months ago
AlpinDale	f40b809d3b allow using v2 block manager with sliding window	7 months ago
AlpinDale	5b0c11d190 support pipeline parallel pynccl groups	8 months ago
AlpinDale	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	8 months ago
AlpinDale	656459fd84 make fp8_e4m3 work on nvidia	8 months ago
AlpinDale	0aaf2dfc6b improve parallel logging	8 months ago
AlpinDale	9e73559eba make use of batched rotary embedding kernels to support long context lora	8 months ago
AlpinDale	eaa06fdd14 fix some f-strings	8 months ago
AlpinDale	c58589318f remove the graph mode func	8 months ago
AlpinDale	072b30fb42 measure end time within the cuda memory profiler	8 months ago
AlpinDale	7bcff4ac03 implement sharded state dict	8 months ago

Newer Older

Commit History Find

Commit History