Commit History

Author SHA1 Message Date
  AlpinDale 60ca1e1e5e feat: add ngram prompt lookup decoding for speculative decoding (#438) 9 months ago
  AlpinDale d8c4193704 feat: Speculative Decoding using a draft model (#432) 9 months ago
  AlpinDale 140ebac03e fix the nsight profiling with ray 9 months ago
  AlpinDale 8d26cf3876 simplify model_executor logic 9 months ago
  AlpinDale 4d33ce60da feat: Triton flash attention backend for ROCm (#407) 9 months ago
  AlpinDale 893c791152 fix TP for llava 9 months ago
  AlpinDale 9aaeb5d349 add speculative config and arg for later 9 months ago
  AlpinDale 10e708726e enable multi-node inference 10 months ago
  AlpinDale 753f6dc51b add v2 block manager 10 months ago
  AlpinDale 41f5af0426 add python nccl wrapper, remove cupy 10 months ago
  AlpinDale 7b9c08afae vision model support 10 months ago
  AlpinDale 2319b411ce refactor: neuron support 10 months ago
  AlpinDale 0f6d56b07f feat: model executor refactor (#367) 10 months ago
  AlpinDale f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 10 months ago