Commit History

Author SHA1 Message Date
  AlpinDale 72229a94da feat: better marlin kernels (#285) 10 months ago
  AlpinDale 769b069e2e AttributeError fix in OpenAI server 10 months ago
  AlpinDale 23a7fd8cda remove ooba endpoint, fix and add deprecation warning for kobold endpoint, fix case where kobold endpoint was always launched with openai (#284) 10 months ago
  AlpinDale 13d850334e fix: navi support (#283) 10 months ago
  AlpinDale 9fa99215f8 feat: add cubic sampling (#280) 10 months ago
  AlpinDale 657aec0cbd refactor: OpenAI endpoint (#261) 10 months ago
  50h100a 3df36ee07d fix: logit bias logitproc (#278) 10 months ago
  AlpinDale 4d04ade9ef feat: fine-grained seeds (#279) 10 months ago
  Stefan Daniel Schwarz 810ca83066 fix+feat: docker compose (#264) 10 months ago
  AlpinDale 16615784b3 fix: prefix cache for turing gpus 10 months ago
  AlpinDale 7dc73a779a fix: properly perform garbage collection for lora (#277) 10 months ago
  AlpinDale 697c06c4f5 fix: LoRA support for mixtral (#276) 10 months ago
  AlpinDale 4b80b42362 fix: memory leaks due to nccl cuda graphs (#275) 10 months ago
  AlpinDale e31c6f0b45 feat: refactor modeling logic and support more models (#274) 10 months ago
  AlpinDale 7d6ba53602 feat: fused top-k kernels for MoE (#273) 10 months ago
  AlpinDale a3cab09b69 chore: logging env variable 10 months ago
  AlpinDale 2c08aa5af4 chore: remove eos token from output (#272) 10 months ago
  AlpinDale 8e1cd54497 fix: do not include fp8 for rocm (#271) 10 months ago
  AlpinDale 6a63ab4ec3 fix: remote encode request if using ray (#270) 10 months ago
  AlpinDale 224b87b484 feat: add fused mixtral moe support (#238) 10 months ago
  Thomas Xin 43cf0e98a0 fix: worker initialization on WSL (#260) 10 months ago
  swadical 0527131e93 fix: grammar logits processor (#268) 10 months ago
  AlpinDale 2370dbcfd8 feat: OPT model support (#266) 10 months ago
  AlpinDale 4360684667 fix: cuda version in wheel 11 months ago
  TearGosling 80e8a14949 feat: add pygchat Jinja template (#218) 11 months ago
  sgsdxzy fe7844f2ef feat: sharding and safetensors support for gguf conversion (#256) 11 months ago
  AlpinDale 55c7a22ca6 add t5 modeling code 11 months ago
  AlpinDale 8635901c76 fix: s-lora vocab embeddings 11 months ago
  AlpinDale c76b611021 docker: update the Dockerfile and push the latest image (#254) 11 months ago
  anon998 35b9033782 fix: crash in quadratic sampling when batch > 1 (#253) 11 months ago