Commit History

Author SHA1 Message Date
  AlpinDale 13e5ffd456 fix distributed_executor_backend in args 6 months ago
  AlpinDale a94de94c44 refactor: combine the prefill and decode into a single API (#553) 6 months ago
  AlpinDale c6a501f682 add multiprocessing executor; make ray optional 6 months ago
  AlpinDale 0cea453d36 automatically detect tensorized models 6 months ago
  AlpinDale be8154a8a0 feat: proper embeddings API with e5-mistral-7b support 6 months ago
  AlpinDale 4acf34417a feat: add DeepSpeedFP quantization for all models 6 months ago
  AlpinDale 197a6d2c16 auto disable speculative decoding by the running queue size 6 months ago
  AlpinDale 21ce19b3ea blocks_to_copy dict -> torch.Tensor 6 months ago
  Brian Dashore 5533ab845e feat: add uvloop (#550) 6 months ago
  AlpinDale 35ae01d7ba refactor: attention metadata term 6 months ago
  AlpinDale 723c6acb84 re-add ngram speculative decoding 6 months ago
  AlpinDale e87c32bed3 feat: full tensor parallel for LoRA layers (#545) 6 months ago
  AlpinDale 46159b107a formatting: pt1 7 months ago
  AlpinDale fca911ee0a vLLM Upstream Sync (#526) 7 months ago
  AlpinDale 42998e423c better quant verification 8 months ago
  AlpinDale 9d81716bfd [v0.5.3] Release Candidate (#388) 8 months ago
  AlpinDale 78d66f16d1 Chunked Prefill Part 1 (#384) 10 months ago
  AlpinDale feb5840f2a feat: async tokenization (#374) 10 months ago
  AlpinDale f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 10 months ago
  AlpinDale c41462cfcd feat: exllamav2 quantization (#305) 10 months ago
  AlpinDale 9810daa699 feat: INT8 KV Cache (#298) 11 months ago
  AlpinDale e0c35bb353 feat: bitsandbytes and `--load-in{4,8}bit` support (#294) 11 months ago
  AlpinDale 705821a7fe feat: AQLM quantization support (#293) 11 months ago
  AlpinDale ac82b67f75 feat: naive context shift and various QoL changes (#289) 11 months ago
  AlpinDale 72229a94da feat: better marlin kernels (#285) 11 months ago
  AlpinDale 657aec0cbd refactor: OpenAI endpoint (#261) 11 months ago
  AlpinDale 4d04ade9ef feat: fine-grained seeds (#279) 11 months ago
  AlpinDale ea0f57b233 feat: allow further support for non-cuda devices (#247) 1 year ago
  AlpinDale c3a221eb02 feat: GGUF, QuIP#, and Marlin support (#228) 1 year ago
  AlpinDale 31c95011a6 feat: FP8 E5M2 KV Cache (#226) 1 year ago