Commit History

作者 SHA1 備註 提交日期
  AlpinDale b03b4d4c8c fix: compute cutlass 3.x epilogues in fp32 instead of 16 7 月之前
  AlpinDale cdff8e89f9 feat: introduce `DraftModelRunner` 7 月之前
  AlpinDale 9868bb2290 chore: make it clear that '%' should NOT be in tensor dict keys 7 月之前
  AlpinDale b8650ec51d fix: better error message for MLPSpeculator 7 月之前
  AlpinDale 0886c361f4 feat: OpenVINO CPU backend (#576) 7 月之前
  AlpinDale d63690a0df chore: add fp8 examples 7 月之前
  AlpinDale b6ff0623a6 chore: clean up branding 7 月之前
  AlpinDale 85ef2fe8b1 chore: clean up placeholder symbols 7 月之前
  AlpinDale 1852d18326 chore: clean up inference examples 7 月之前
  AlpinDale e1c4cf1d50 chore: organize chat templates 7 月之前
  AlpinDale c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support 7 月之前
  AlpinDale 426a13ab73 fix: pass multi_modal_kwargs to CPU model runner 7 月之前
  AlpinDale bb4da84623 fix: make sure multi modal kwargs can broadcast properly with ring buffer 7 月之前
  AlpinDale d2461161ec chore: optimize KV cache swapping for TPU 7 月之前
  AlpinDale fad45609b8 chore: remove logical token blocks (turns out they are not needed) 7 月之前
  AlpinDale b3643a7bd7 fix: min_tokens for when there are multiple eos tokens 7 月之前
  AlpinDale 51cfadeb29 fix: `MLPSpeculator` handling of `num_speculative_tokens` 7 月之前
  AlpinDale c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class 7 月之前
  AlpinDale b81966c0da fix: missed phi3v 7 月之前
  AlpinDale 56e0b8223c chore: add base class for LoRA-supported models 7 月之前
  AlpinDale bc5ac9584a fix: make tensor_dict flattening/unflattening more generic 7 月之前
  AlpinDale dead030abf fix: cuda graph with MLPSpeculator 7 月之前
  AlpinDale 271a680026 feat: inference support for PowerPC ISA 7 月之前
  AlpinDale 8b626e4032 fix: cpu kv cache allocation for TPU 7 月之前
  AlpinDale fcd58614f4 feat: support parallel sampling and swapping in TPU 7 月之前
  AlpinDale 5b464d36ea feat: bias epilogue support for cutlass kernels 7 月之前
  AlpinDale b16173b41e chore: add minimum concurrency for XPU 7 月之前
  AlpinDale af1286f9fa fix: kv cache size calculation on TPUs 7 月之前
  AlpinDale ecd4460d55 fix: support 2D inputs for embeddings 7 月之前
  AlpinDale 66be475aae fix: shm broadcast when the queue size is full 7 月之前