Commit History

Author SHA1 Message Date
  AlpinDale ac79d115b3 add guards for prefix caching, fp8, chunked, etc 7 months ago
  AlpinDale f6250c5516 move dockerfiles to root; fix cpu build 7 months ago
  AlpinDale 4e1ae004da make mp the default distributed backend 7 months ago
  AlpinDale 656459fd84 make fp8_e4m3 work on nvidia 7 months ago
  AlpinDale 60e74e92fd add rope_scaling arg 7 months ago
  AlpinDale 9e73559eba make use of batched rotary embedding kernels to support long context lora 7 months ago
  AlpinDale 7bcff4ac03 implement sharded state dict 7 months ago
  AlpinDale 13e5ffd456 fix distributed_executor_backend in args 7 months ago
  AlpinDale a94de94c44 refactor: combine the prefill and decode into a single API (#553) 7 months ago
  AlpinDale c6a501f682 add multiprocessing executor; make ray optional 7 months ago
  AlpinDale 0cea453d36 automatically detect tensorized models 7 months ago
  AlpinDale be8154a8a0 feat: proper embeddings API with e5-mistral-7b support 7 months ago
  AlpinDale 4acf34417a feat: add DeepSpeedFP quantization for all models 7 months ago
  AlpinDale 197a6d2c16 auto disable speculative decoding by the running queue size 7 months ago
  AlpinDale 21ce19b3ea blocks_to_copy dict -> torch.Tensor 7 months ago
  Brian Dashore 5533ab845e feat: add uvloop (#550) 7 months ago
  AlpinDale 35ae01d7ba refactor: attention metadata term 7 months ago
  AlpinDale 723c6acb84 re-add ngram speculative decoding 7 months ago
  AlpinDale e87c32bed3 feat: full tensor parallel for LoRA layers (#545) 7 months ago
  AlpinDale 46159b107a formatting: pt1 8 months ago
  AlpinDale fca911ee0a vLLM Upstream Sync (#526) 8 months ago
  AlpinDale 42998e423c better quant verification 9 months ago
  AlpinDale 9d81716bfd [v0.5.3] Release Candidate (#388) 10 months ago
  AlpinDale 78d66f16d1 Chunked Prefill Part 1 (#384) 11 months ago
  AlpinDale feb5840f2a feat: async tokenization (#374) 11 months ago
  AlpinDale f8dfac6372 chore: attention refactor and upstream sync apr01 (#365) 11 months ago
  AlpinDale c41462cfcd feat: exllamav2 quantization (#305) 1 year ago
  AlpinDale 9810daa699 feat: INT8 KV Cache (#298) 1 year ago
  AlpinDale e0c35bb353 feat: bitsandbytes and `--load-in{4,8}bit` support (#294) 1 year ago
  AlpinDale 705821a7fe feat: AQLM quantization support (#293) 1 year ago