Commit History

Author SHA1 Message Date
  AlpinDale ea0f57b233 feat: allow further support for non-cuda devices (#247) 11 months ago
  AlpinDale 4faf78ba29 fix: grab correct quant config from revisions (#246) 11 months ago
  AlpinDale 7760913873 fix: garbage output from GPTQ (#245) 11 months ago
  50h100a f619c96c79 fix: zero token output due to temperature bias (#243) 11 months ago
  50h100a 53a9c60442 fix: logit processor declarations and application (#242) 11 months ago
  AlpinDale 9ed45fec7c fix: incorrect prometheus url 11 months ago
  AlpinDale d2db4143fa feat: add grafana for metrics (#240) 11 months ago
  AlpinDale 1a94ccf3cf fix: prefix cache fail with lora (#239) 11 months ago
  AlpinDale 85c92acfb3 fix: do not initialize all-reduce at world_size=1 11 months ago
  AlpinDale d9b65e6c5f feat: DeepSeek MoE support (#237) 11 months ago
  AlpinDale e73a92ad2f fix: remove the mask for quadratic sampling (#236) 11 months ago
  AlpinDale aebd68c632 feat: backport kernels (#235) 11 months ago
  AlpinDale bb158b6282 fix: bump torch to 2.2.0 (#234) 11 months ago
  AlpinDale 1c46fa31ad feat: add quadratic sampling (#233) 11 months ago
  AlpinDale f0dacc17dd fix: remove fast-hadamard-transform in requirements 11 months ago
  AlpinDale 5d288aa76c feat: add fast hadamard transformation kernels (#232) 11 months ago
  AlpinDale 12fb635f70 readme: add docker 1 year ago
  AlpinDale eb8698c7bd readme: update with new benchmarks 1 year ago
  AlpinDale 59df05f341 feat: add `/metrics` route for kobold (#229) 1 year ago
  AlpinDale c3a221eb02 feat: GGUF, QuIP#, and Marlin support (#228) 1 year ago
  AlpinDale 6305e6f3f2 fix: no repeated IPC registration (#227) 1 year ago
  AlpinDale 0adab894fe feat: grammar support (#206) 1 year ago
  AlpinDale 31c95011a6 feat: FP8 E5M2 KV Cache (#226) 1 year ago
  AlpinDale c0146ed00e chore: slight refactor for async engine finish (#225) 1 year ago
  AlpinDale 339c6aec53 chore: bump ray version 1 year ago
  AlpinDale 641bb0f6e9 feat: add custom allreduce kernels (#224) 1 year ago
  AlpinDale 26a717b49f fix: use head_dim if available 1 year ago
  AlpinDale 5053743c1c feat: speedup AWQ (#223) 1 year ago
  AlpinDale c0aac15421 feat: S-LoRA support (#222) 1 year ago
  AlpinDale 8fa608aeb7 feat: replace Ray with NCCL for control plane comms (#221) 1 year ago