Commit Verlauf

Autor SHA1 Nachricht Datum
  AlpinDale c577c31aaa feat: tree attention vor 8 Monaten
  AlpinDale b5a730d069 fix: options requests in the api (#439) vor 8 Monaten
  AlpinDale f216601f18 fix: logging in the API server vor 8 Monaten
  Pyroserenus 3de1c23a0e chore: update Kobold Lite Embed (#433) vor 8 Monaten
  sgsdxzy b28011e86e fix: shard exl2 weights more evenly between ranks (#437) vor 8 Monaten
  sgsdxzy a3b1602391 fix: rope scaling for cohere and qwen (#436) vor 8 Monaten
  AlpinDale 60ca1e1e5e feat: add ngram prompt lookup decoding for speculative decoding (#438) vor 8 Monaten
  AlpinDale d8c4193704 feat: Speculative Decoding using a draft model (#432) vor 8 Monaten
  sgsdxzy 0d0c6b313c fix: linear bias of qkv layers in models (#430) vor 8 Monaten
  sgsdxzy 1528ce50e5 fix: abort requests when the connection to /v1/completions is interrupted (#431) vor 8 Monaten
  AlpinDale 76f36af704 feat: LM Format Enforcer support (#428) vor 8 Monaten
  AlpinDale 4d338a38c2 fix engine_use_ray=True vor 8 Monaten
  AlpinDale 140ebac03e fix the nsight profiling with ray vor 8 Monaten
  AlpinDale 8d26cf3876 simplify model_executor logic vor 8 Monaten
  AlpinDale f9f13a348d fix CPU blocks logger for CPU backend vor 8 Monaten
  AlpinDale 73143eaea5 suppress import error for eetq vor 8 Monaten
  50h100a f67b5be198 chore: port sampler+metadata changes from main to dev (#427) vor 8 Monaten
  sgsdxzy 58b0616dd3 feat: support sharded ggufs (#420) vor 8 Monaten
  sgsdxzy 589fe0c73e fix: split the exl2 weight loading and SQ+ init (#423) vor 8 Monaten
  AlpinDale f1accfac9f add CLI app vor 8 Monaten
  AlpinDale 309339ffd3 separate api server args into another file vor 8 Monaten
  sgsdxzy f3b546e33a feat: upport twe lm_head for quantized weights (#409) vor 8 Monaten
  sgsdxzy 214151b04c fix: max_num_batched_tokens for chunked_prefill (#412) vor 8 Monaten
  AlpinDale 1dccb03b17 incorrect comparison for hadamard and punica checks vor 8 Monaten
  sgsdxzy 6a0a6360f1 fix: Allow setting config-path when converting ggufs. (#410) vor 8 Monaten
  sgsdxzy fcfb72af24 Support arbitrary model in GGUF. (#381) vor 8 Monaten
  AlpinDale bd0ddf1cfe feat: EETQ quantization (#408) vor 8 Monaten
  AlpinDale b1caee23a6 cache the p2p access check for memory saving vor 8 Monaten
  AlpinDale 373e0d3c01 fix neuron vor 8 Monaten
  AlpinDale 28bcca2396 incorrect use of monotonic time in metrics logger vor 8 Monaten