david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	6600c082bc chore: pass bias to quant_method.apply	4 months ago
AlpinDale	3a53ff1e01 fix: raise an error for no draft token case when draft_tp>1	4 months ago
AlpinDale	9fd01e6358 fix: the metrics endpoint was not mounted	4 months ago
AlpinDale	00503b9fc1 feat: non-uniform quantization via `compressed-tensors` for llama	4 months ago
AlpinDale	2c653a2268 fix: make speculative decoding work with per-request seed	4 months ago
AlpinDale	b7a2d52e47 fix: allow using mp executor for pipeline parallel	4 months ago
AlpinDale	e90ad4acec chore: implement fallback for fp8 channelwise using torch._scaled_mm	4 months ago
AlpinDale	19340b672e chore: improve min_capability checking for `compressed-tensors`	4 months ago
AlpinDale	b6c4dfce23 chore: refactor TPU model runner and worker	4 months ago
AlpinDale	8adc496a2a fix: use paged attention for bloc swapping/copying in flashinfer	4 months ago
AlpinDale	a26f784240 chore: use the LoRA tokenizer in OpenAI API (#599)	4 months ago
AlpinDale	8ee8483fcf `enable_gpu_advance_step` -> `allo_gpu_advance_step`	4 months ago
AlpinDale	052a6e1eb6 feat: add SPMD worker execution using Ray accelerated DAG	4 months ago
AlpinDale	65a97216a7 fix: avoid secondary error in ShmRingBuffer destructor	4 months ago
AlpinDale	6671e3a162 feat: add CPU offloading support (#598)	4 months ago
AlpinDale	fb4c01740c feat: add asymmetric TP support for Qwen2	4 months ago
AlpinDale	ee2c5d34da feat: add fp8 channel-wise weight quantization support	4 months ago
AlpinDale	6c4c20652b feat: pipeline parallel support for mixtral	4 months ago
AlpinDale	196e6b64f1 feat: add fp8 dynamic per-token quant kernel	4 months ago
AlpinDale	5dbfc200f2 update all benchmarks (#597)	4 months ago
AlpinDale	dd18c5042c move prepare_inputs to the GPU (#596)	4 months ago
AlpinDale	22305c91e9 refactor _prepare_model_input_tensor and attn metadata builder for most backends	4 months ago
AlpinDale	e8af0d4a3b fix: type annotation in worker	4 months ago
AlpinDale	8c2dd39500 chore: remove multimodal stuff from TPU	4 months ago
AlpinDale	6f8beb8583 fix: 4-node crash with PP	4 months ago
AlpinDale	d638dc592d fix: some minor typing issues in spec decode	4 months ago
AlpinDale	0b2ae31122 cleanup rocm dockerfile	4 months ago
AlpinDale	0429cb2229 fix: only create embeddings and lm_head when necessary for PP	4 months ago
AlpinDale	2dfa4e47e6 chore: set seed for dummy weights init	4 months ago
AlpinDale	f5d52320da Port mamba kernels to Aphrodite (#595)	4 months ago

Newer Older

Commit History Find

Commit History