Commit History

Author SHA1 Message Date
  AlpinDale b26a014b12 fix: prometheus.yaml path in monitoring example (#969) 2 weeks ago
  AlpinDale 5bec8fbb1b tpu: add support for async postprocessing (#968) 2 weeks ago
  AlpinDale a8bdd488b9 distributed: support pipeline parallelism for internvl and internlm2 (#965) 2 weeks ago
  AlpinDale cbd51a208a ci: bump to 0.6.5 (#964) 2 weeks ago
  AlpinDale 0dfa6b60ec core: support logprobs with multi-step scheduling (#963) 2 weeks ago
  AlpinDale 34e8606e81 vlm: do not allow max_model_len overflow (#962) 2 weeks ago
  AlpinDale 6bdff60aab quant: support pre-quanted bitsandbytes checkpoints (#961) 2 weeks ago
  AlpinDale ba6d798784 neuron: support for context length and token bucketing (#960) 2 weeks ago
  AlpinDale f4b62bf803 quant: update tpu_int8 to use AphroditeParameters (#959) 2 weeks ago
  AlpinDale 9ff3239ce2 fix: gguf vocab embddings in TP (#958) 2 weeks ago
  AlpinDale 22b8096006 misc: extend cuda graph capture size for H200 (#957) 2 weeks ago
  AlpinDale d6cbbba95f Revert "fix: issues with flashinfer fp8 kv (#950)" (#956) 2 weeks ago
  AlpinDale 5be6225f38 core: support multi-step scheduling w/ async post-processor (#955) 2 weeks ago
  AlpinDale 564d197687 spec decode: match the original rank computation impl for spec decoding (#954) 2 weeks ago
  AlpinDale 2aabf8fcf7 vlm: fix errors on ragged NestedTensors (#953) 2 weeks ago
  AlpinDale ea59784f59 tpu: remove torch._dynamo.reset() (#952) 2 weeks ago
  AlpinDale 39b2e83ac3 api: optimize zeromq frontend performance (#951) 2 weeks ago
  AlpinDale cef6da8863 fix: issues with flashinfer fp8 kv (#950) 2 weeks ago
  AlpinDale 0e5cf7f840 tpu: avoid dynamo guard eval overhead (#949) 2 weeks ago
  AlpinDale bf4a4d8516 fix: do not register punica with torch if using older torch (#948) 2 weeks ago
  AlpinDale a90d41d908 tests: add kernel tests for causal_conv1d and mamba_ssm (#947) 2 weeks ago
  AlpinDale fcfcfc65e1 quants: add triton kernels for AWQ (#946) 2 weeks ago
  AlpinDale a62f0925fe update flashinfer test (#945) 2 weeks ago
  AlpinDale 4ddc14d653 core: use flashinfer for FP8 KV when available (#944) 2 weeks ago
  khanonnie e1eb7fbedc fix: SentencePieceTokenizer error when using mistral tokenizer mode (#943) 2 weeks ago
  AlpinDale 689ed70f4e vlm: fix persimmon and fuyu issues with transformers 4.45 (#942) 2 weeks ago
  AlpinDale 09324ea2ea vlm: fix incompatibility nested tensors and multi-image llava-next (#941) 2 weeks ago
  AlpinDale c5c09720b0 api: log prompt truncation (#940) 2 weeks ago
  AlpinDale 0e2bfccda0 core: add virtual engine for async outproc (#939) 2 weeks ago
  AlpinDale f1ea7711bd core: do not compile ScalarType for torch < 2.4.0 (#938) 2 weeks ago