AlpinDale
|
ea0f57b233
feat: allow further support for non-cuda devices (#247)
|
11 months ago |
AlpinDale
|
4faf78ba29
fix: grab correct quant config from revisions (#246)
|
11 months ago |
AlpinDale
|
7760913873
fix: garbage output from GPTQ (#245)
|
11 months ago |
50h100a
|
f619c96c79
fix: zero token output due to temperature bias (#243)
|
11 months ago |
50h100a
|
53a9c60442
fix: logit processor declarations and application (#242)
|
11 months ago |
AlpinDale
|
9ed45fec7c
fix: incorrect prometheus url
|
11 months ago |
AlpinDale
|
d2db4143fa
feat: add grafana for metrics (#240)
|
11 months ago |
AlpinDale
|
1a94ccf3cf
fix: prefix cache fail with lora (#239)
|
11 months ago |
AlpinDale
|
85c92acfb3
fix: do not initialize all-reduce at world_size=1
|
11 months ago |
AlpinDale
|
d9b65e6c5f
feat: DeepSeek MoE support (#237)
|
11 months ago |
AlpinDale
|
e73a92ad2f
fix: remove the mask for quadratic sampling (#236)
|
11 months ago |
AlpinDale
|
aebd68c632
feat: backport kernels (#235)
|
11 months ago |
AlpinDale
|
bb158b6282
fix: bump torch to 2.2.0 (#234)
|
11 months ago |
AlpinDale
|
1c46fa31ad
feat: add quadratic sampling (#233)
|
11 months ago |
AlpinDale
|
f0dacc17dd
fix: remove fast-hadamard-transform in requirements
|
11 months ago |
AlpinDale
|
5d288aa76c
feat: add fast hadamard transformation kernels (#232)
|
11 months ago |
AlpinDale
|
12fb635f70
readme: add docker
|
1 year ago |
AlpinDale
|
eb8698c7bd
readme: update with new benchmarks
|
1 year ago |
AlpinDale
|
59df05f341
feat: add `/metrics` route for kobold (#229)
|
1 year ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 year ago |
AlpinDale
|
6305e6f3f2
fix: no repeated IPC registration (#227)
|
1 year ago |
AlpinDale
|
0adab894fe
feat: grammar support (#206)
|
1 year ago |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
1 year ago |
AlpinDale
|
c0146ed00e
chore: slight refactor for async engine finish (#225)
|
1 year ago |
AlpinDale
|
339c6aec53
chore: bump ray version
|
1 year ago |
AlpinDale
|
641bb0f6e9
feat: add custom allreduce kernels (#224)
|
1 year ago |
AlpinDale
|
26a717b49f
fix: use head_dim if available
|
1 year ago |
AlpinDale
|
5053743c1c
feat: speedup AWQ (#223)
|
1 year ago |
AlpinDale
|
c0aac15421
feat: S-LoRA support (#222)
|
1 year ago |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
1 year ago |