Commit History

Author SHA1 Message Date
  AlpinDale be82f93463 refactor: factor out chat message parsing 4 months ago
  AlpinDale 87694c8aba feat: add RPC server and client via ZMQ (#615) 4 months ago
  AlpinDale 0227f8cf25 revert: incorrect nightly build 4 months ago
  AlpinDale 54f3a52609 chore: add env var to enable torch.compile 4 months ago
  AlpinDale cabca7383d fix: use loopback address for single node again 4 months ago
  AlpinDale 523ac99aca chore: pipeline parallel with Ray accelerated dag 4 months ago
  AlpinDale 141672a0d4 kernels: disambiguate quantized types via a new ScalarType 4 months ago
  AlpinDale 7844103e43 build: update torch to 2.4.0 for cpu 4 months ago
  AlpinDale 2050b42f3f fix: remove unused code in sampler 4 months ago
  AlpinDale 6124140a45 fix: remove error_on_invalid_device_count_status 4 months ago
  AlpinDale 8e50f26b71 fix: input shape for flashinfer prefill wrapper 4 months ago
  AlpinDale ebebccea6f chore: optimize get_seqs 4 months ago
  AlpinDale 614ca6b0bf feat: support logits soft capping with flash attention backend 4 months ago
  AlpinDale f6959b40ac 'fix: lower gemma's unloaded_params exception to warning 4 months ago
  AlpinDale 92987963a4 fix: RMSNorm forward in InternViT attention qk_layernorm 4 months ago
  AlpinDale b15e6376f8 bump to torch 2.4.0, add aphrodite_flash_attn (#614) 4 months ago
  AlpinDale 44d81c3a3e chore: bump torch to 2.4.0 4 months ago
  AlpinDale 9866af1626 chore: optimize scheduler and remove policy 4 months ago
  AlpinDale 3134fcaebc fix: set a default max_tokens for OAI requests 4 months ago
  AlpinDale cd31f8efbb chore: optimize PP comm by replacing send with partial send + allgather 4 months ago
  AlpinDale c9310eeb02 fix: skip loading lm_head for tie_word_embeddings models 4 months ago
  AlpinDale 1cd7056c46 fix: don't use torch.generator() for TPU 4 months ago
  AlpinDale d357341203 chore: add pipeline parallel support for Qwen 4 months ago
  AlpinDale 98f9dbd734 feat: Triton Kernels for Punica (#613) 4 months ago
  AlpinDale 5f23908977 chore; tune cutlass int8 kernels for sm_75 4 months ago
  AlpinDale 3ce81e215b chore: enable fp8 cutlass for ada lovelace 4 months ago
  AlpinDale 165a3aa7b3 fix: fp8 marlin and cpu offloading with fp8 marlin 4 months ago
  AlpinDale 5cb760162c feat: allow loading specific layer numbers per device 4 months ago
  AlpinDale f83eb07fd1 feat: use FusedMoE for jamba 4 months ago
  AlpinDale 9e9515f39a fix: feature size calculation for Llava-next 4 months ago