Commit History

Author SHA1 Message Date
  AlpinDale b029a544ff optimize eager mode host time with numpy 7 months ago
  AlpinDale ced1b36b8b feat: support head size of 192 7 months ago
  AlpinDale 4ab4c5c87c oops 7 months ago
  AlpinDale 9e79a15b9f fix: ignore warnings for sparseml 7 months ago
  AlpinDale d45c846c8c do not build sm_90a for cuda 11 7 months ago
  AlpinDale 08f639b8aa remove duplicate seq_lens_tensor 7 months ago
  AlpinDale 072aec1062 automatically detect sparseml models 7 months ago
  AlpinDale 5cedee9024 fix gemma with gptq marlin 7 months ago
  AlpinDale 9d19811d4f avoid the nee dto pass `None` values to `Sequence.inputs` 7 months ago
  AlpinDale f2b7a42c4e fix: async cancels in merge_async_iterators for python>=3.9 7 months ago
  AlpinDale 9099040472 feat: cross-attention kv caching support 7 months ago
  AlpinDale b2fd915c35 improve p2p access check 7 months ago
  AlpinDale 7194047318 remove vllm-nccl 7 months ago
  AlpinDale 6785d78d82 fix: do not expose EOS token in the API 7 months ago
  AlpinDale 90ceab32ff refactor: consolidate prompt args to LLM engines 7 months ago
  AlpinDale e4ea3da1ad fix: tensor parallel with embedding model 7 months ago
  AlpinDale f40b809d3b allow using v2 block manager with sliding window 7 months ago
  AlpinDale 2649f3f14e aqlm works on pascal 7 months ago
  AlpinDale ac79d115b3 add guards for prefix caching, fp8, chunked, etc 7 months ago
  AlpinDale 344ddaac5a properly disable speculative decoding 7 months ago
  AlpinDale 696f2cd59c add phi3_small support with blocksparse attention 7 months ago
  AlpinDale 0d15aa3ab3 fix prefix caching for block manager v2 7 months ago
  AlpinDale 7d0884de9a fix mistral v0.3 weight loading 7 months ago
  AlpinDale e8b7f53321 allow prompt token IDs in the logits processor api 7 months ago
  AlpinDale f4ea11b982 feat: initial support for activation quantization 7 months ago
  Drake e1a142c179 Fix OpenAI chat completions compatibility (#559) 7 months ago
  AlpinDale 5b0c11d190 support pipeline parallel pynccl groups 7 months ago
  AlpinDale f6250c5516 move dockerfiles to root; fix cpu build 7 months ago
  AlpinDale d8667fcb98 improve gptq_marlin_24 prefill performance 7 months ago
  AlpinDale eb2c5c77df feat: enforce the max possible seqlen 7 months ago