Commit History

Author SHA1 Message Date
  AlpinDale 7253e9052d feat: integrate typical acceptance sampling for spec decoding 5 months ago
  AlpinDale b8a19ba27f chore: extend aphrodite metrics logging api 5 months ago
  AlpinDale bbde979ecd DeepSeek-V2 (#579) 5 months ago
  AlpinDale b8650ec51d fix: better error message for MLPSpeculator 5 months ago
  AlpinDale 0886c361f4 feat: OpenVINO CPU backend (#576) 5 months ago
  AlpinDale c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support 5 months ago
  AlpinDale 51cfadeb29 fix: `MLPSpeculator` handling of `num_speculative_tokens` 5 months ago
  AlpinDale 2c321ce1f2 chore: upgrade to rocm 6.1, update docker 5 months ago
  AlpinDale 80ac1cdc8f fix: add args for the draft tp 5 months ago
  AlpinDale af43576da0 feat: add MLPSpeculator speculative decoding support (#572) 5 months ago
  AlpinDale 0613d91551 fix: kv head calculation with MPT GQA 5 months ago
  AlpinDale 6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571) 5 months ago
  AlpinDale e4407bbcb7 fix: do not start a ray cluster when not using ray 5 months ago
  AlpinDale ee174ea4fd fix: guard for lora + chunked prefill 5 months ago
  AlpinDale a89c9a0e92 fix: device ordinal issues with world_size and stuff 5 months ago
  AlpinDale 06ed127441 fix: do not raise optimization warning for fp8 quant 5 months ago
  AlpinDale fe21123a1c feat: TPU support (#570) 5 months ago
  AlpinDale fa58ba87a3 fix: only set executor backend to mp if not multi-node 5 months ago
  AlpinDale bba89fc6d3 chore: make the automatic rope scaling behave properly with rope_scaling arg, add rope theta 5 months ago
  AlpinDale 517676249c chore: update the compressed-tensors config 5 months ago
  AlpinDale 76d6f49bbb fix: modelscope downloads 5 months ago
  AlpinDale f2e94e2184 chore: minor llava cleanups in preparation for llava-next 5 months ago
  AlpinDale 237fa59aea feat: support CPU/GPU swapping in BlockManagerV2 5 months ago
  AlpinDale 8d77c69cbd feat: support image processor and add llava example 5 months ago
  AlpinDale 690110a051 feat: bitsandbytes quantization 5 months ago
  AlpinDale 0307da9e15 refactor: bitsandbytes -> autoquant 5 months ago
  AlpinDale 072aec1062 automatically detect sparseml models 5 months ago
  AlpinDale ac79d115b3 add guards for prefix caching, fp8, chunked, etc 5 months ago
  AlpinDale 656459fd84 make fp8_e4m3 work on nvidia 6 months ago
  AlpinDale 60e74e92fd add rope_scaling arg 6 months ago