Commit History

Author SHA1 Message Date
  AlpinDale 4ed1bb9958 chore: add fault tolerance for RayTokenizerGroupPool 7 months ago
  AlpinDale 622de63c03 fix: remove useless code from cpu worker 7 months ago
  AlpinDale 80ac1cdc8f fix: add args for the draft tp 7 months ago
  AlpinDale abbb730607 feat: support draft model on different tensor parallel size 7 months ago
  AlpinDale 5974495461 chore: phi3v resize for dynamic shape 7 months ago
  AlpinDale e238abf0cc chore: send and recv helper functions 7 months ago
  AlpinDale 051daa0435 fix: add cutlass2x fallback kernels 7 months ago
  AlpinDale 3389dcdde5 fix: why the hell was this not committed? 7 months ago
  AlpinDale 25feb1d592 chore: add support for pinning lora adapters in the lru cache 7 months ago
  AlpinDale 0c662bc813 fix: exclude modelscope 1.15.0 7 months ago
  AlpinDale b3d2b639d2 feat: add `gelu_quick` CPU kernel 7 months ago
  AlpinDale 1b340083b1 feat: add shm broadcast 7 months ago
  AlpinDale 8a6e83b52e feat: fully sharded QKVParallelLinearWithLora support 7 months ago
  AlpinDale 4f42985b5c feat: qwen2 lora shapes 7 months ago
  AlpinDale af43576da0 feat: add MLPSpeculator speculative decoding support (#572) 7 months ago
  AlpinDale ead08e9711 fix: missing next_pow_2 header function 7 months ago
  AlpinDale 7a3e38f79c fix: cutlass kernel compilation 7 months ago
  AlpinDale 017b42c517 chore: use fork as the default method for mp backend 7 months ago
  AlpinDale cd9ed8623b fix: cuda version check for fp8 support in the cutlass kernels 7 months ago
  AlpinDale fad77538de feat: update cutlass int8 kernel configs for sm90 7 months ago
  AlpinDale b753ff7870 feat: per-channel support for static activation quant 7 months ago
  AlpinDale 3c7444c89b fix: asyncio.run hangs in python < 3.12 7 months ago
  AlpinDale d44ac8e497 fix: `--preemption_mode` -> `--preemption-mode` 7 months ago
  AlpinDale bcf9c83e6a fix: incorrect args passed to generate() method in phi3v example 7 months ago
  AlpinDale 025322ee5f fix: fp8 kv cache for qwen2 models 7 months ago
  AlpinDale 323fe23b21 chore: use 127.0.0.1 for single-node setups 7 months ago
  AlpinDale 89be49d058 fix: build for mi300x 7 months ago
  AlpinDale 7d3da17e19 fix: phi3 rope scaling 7 months ago
  AlpinDale 765adcfba1 chore: add w8a8 benchmark scripts 7 months ago
  AlpinDale 1587fab5de fix: cuda version check for mma warning suppression 7 months ago