Commit History

Author SHA1 Message Date
  AlpinDale c577c31aaa feat: tree attention 8 months ago
  AlpinDale b5a730d069 fix: options requests in the api (#439) 8 months ago
  AlpinDale f216601f18 fix: logging in the API server 8 months ago
  Pyroserenus 3de1c23a0e chore: update Kobold Lite Embed (#433) 8 months ago
  sgsdxzy b28011e86e fix: shard exl2 weights more evenly between ranks (#437) 8 months ago
  sgsdxzy a3b1602391 fix: rope scaling for cohere and qwen (#436) 8 months ago
  AlpinDale 60ca1e1e5e feat: add ngram prompt lookup decoding for speculative decoding (#438) 8 months ago
  AlpinDale d8c4193704 feat: Speculative Decoding using a draft model (#432) 8 months ago
  sgsdxzy 0d0c6b313c fix: linear bias of qkv layers in models (#430) 8 months ago
  sgsdxzy 1528ce50e5 fix: abort requests when the connection to /v1/completions is interrupted (#431) 8 months ago
  AlpinDale 76f36af704 feat: LM Format Enforcer support (#428) 8 months ago
  AlpinDale 4d338a38c2 fix engine_use_ray=True 8 months ago
  AlpinDale 140ebac03e fix the nsight profiling with ray 8 months ago
  AlpinDale 8d26cf3876 simplify model_executor logic 8 months ago
  AlpinDale f9f13a348d fix CPU blocks logger for CPU backend 8 months ago
  AlpinDale 73143eaea5 suppress import error for eetq 8 months ago
  50h100a f67b5be198 chore: port sampler+metadata changes from main to dev (#427) 8 months ago
  sgsdxzy 58b0616dd3 feat: support sharded ggufs (#420) 8 months ago
  sgsdxzy 589fe0c73e fix: split the exl2 weight loading and SQ+ init (#423) 8 months ago
  AlpinDale f1accfac9f add CLI app 8 months ago
  AlpinDale 309339ffd3 separate api server args into another file 8 months ago
  sgsdxzy f3b546e33a feat: upport twe lm_head for quantized weights (#409) 8 months ago
  sgsdxzy 214151b04c fix: max_num_batched_tokens for chunked_prefill (#412) 8 months ago
  AlpinDale 1dccb03b17 incorrect comparison for hadamard and punica checks 8 months ago
  sgsdxzy 6a0a6360f1 fix: Allow setting config-path when converting ggufs. (#410) 8 months ago
  sgsdxzy fcfb72af24 Support arbitrary model in GGUF. (#381) 8 months ago
  AlpinDale bd0ddf1cfe feat: EETQ quantization (#408) 8 months ago
  AlpinDale b1caee23a6 cache the p2p access check for memory saving 8 months ago
  AlpinDale 373e0d3c01 fix neuron 8 months ago
  AlpinDale 28bcca2396 incorrect use of monotonic time in metrics logger 8 months ago