Commit History

Author SHA1 Message Date
  AlpinDale 6600c082bc chore: pass bias to quant_method.apply 4 months ago
  AlpinDale 3a53ff1e01 fix: raise an error for no draft token case when draft_tp>1 4 months ago
  AlpinDale 9fd01e6358 fix: the metrics endpoint was not mounted 4 months ago
  AlpinDale 00503b9fc1 feat: non-uniform quantization via `compressed-tensors` for llama 4 months ago
  AlpinDale 2c653a2268 fix: make speculative decoding work with per-request seed 4 months ago
  AlpinDale b7a2d52e47 fix: allow using mp executor for pipeline parallel 4 months ago
  AlpinDale e90ad4acec chore: implement fallback for fp8 channelwise using torch._scaled_mm 4 months ago
  AlpinDale 19340b672e chore: improve min_capability checking for `compressed-tensors` 4 months ago
  AlpinDale b6c4dfce23 chore: refactor TPU model runner and worker 4 months ago
  AlpinDale 8adc496a2a fix: use paged attention for bloc swapping/copying in flashinfer 4 months ago
  AlpinDale a26f784240 chore: use the LoRA tokenizer in OpenAI API (#599) 4 months ago
  AlpinDale 8ee8483fcf `enable_gpu_advance_step` -> `allo_gpu_advance_step` 4 months ago
  AlpinDale 052a6e1eb6 feat: add SPMD worker execution using Ray accelerated DAG 4 months ago
  AlpinDale 65a97216a7 fix: avoid secondary error in ShmRingBuffer destructor 4 months ago
  AlpinDale 6671e3a162 feat: add CPU offloading support (#598) 4 months ago
  AlpinDale fb4c01740c feat: add asymmetric TP support for Qwen2 4 months ago
  AlpinDale ee2c5d34da feat: add fp8 channel-wise weight quantization support 4 months ago
  AlpinDale 6c4c20652b feat: pipeline parallel support for mixtral 4 months ago
  AlpinDale 196e6b64f1 feat: add fp8 dynamic per-token quant kernel 4 months ago
  AlpinDale 5dbfc200f2 update all benchmarks (#597) 4 months ago
  AlpinDale dd18c5042c move prepare_inputs to the GPU (#596) 4 months ago
  AlpinDale 22305c91e9 refactor _prepare_model_input_tensor and attn metadata builder for most backends 4 months ago
  AlpinDale e8af0d4a3b fix: type annotation in worker 4 months ago
  AlpinDale 8c2dd39500 chore: remove multimodal stuff from TPU 4 months ago
  AlpinDale 6f8beb8583 fix: 4-node crash with PP 4 months ago
  AlpinDale d638dc592d fix: some minor typing issues in spec decode 4 months ago
  AlpinDale 0b2ae31122 cleanup rocm dockerfile 4 months ago
  AlpinDale 0429cb2229 fix: only create embeddings and lm_head when necessary for PP 4 months ago
  AlpinDale 2dfa4e47e6 chore: set seed for dummy weights init 4 months ago
  AlpinDale f5d52320da Port mamba kernels to Aphrodite (#595) 4 months ago