AlpinDale
|
5ee340db84
better naming
|
11 months ago |
AlpinDale
|
7085b074e6
fix pydantic v2 issues
|
11 months ago |
AlpinDale
|
4ea4f5943e
refactor openai endpoints and add function calls
|
11 months ago |
AlpinDale
|
85c92acfb3
fix: do not initialize all-reduce at world_size=1
|
11 months ago |
AlpinDale
|
d9b65e6c5f
feat: DeepSeek MoE support (#237)
|
11 months ago |
AlpinDale
|
e73a92ad2f
fix: remove the mask for quadratic sampling (#236)
|
11 months ago |
AlpinDale
|
aebd68c632
feat: backport kernels (#235)
|
11 months ago |
AlpinDale
|
bb158b6282
fix: bump torch to 2.2.0 (#234)
|
11 months ago |
AlpinDale
|
1c46fa31ad
feat: add quadratic sampling (#233)
|
11 months ago |
AlpinDale
|
f0dacc17dd
fix: remove fast-hadamard-transform in requirements
|
11 months ago |
AlpinDale
|
5d288aa76c
feat: add fast hadamard transformation kernels (#232)
|
11 months ago |
AlpinDale
|
12fb635f70
readme: add docker
|
11 months ago |
AlpinDale
|
eb8698c7bd
readme: update with new benchmarks
|
11 months ago |
AlpinDale
|
59df05f341
feat: add `/metrics` route for kobold (#229)
|
11 months ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
11 months ago |
AlpinDale
|
6305e6f3f2
fix: no repeated IPC registration (#227)
|
11 months ago |
AlpinDale
|
0adab894fe
feat: grammar support (#206)
|
11 months ago |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
11 months ago |
AlpinDale
|
c0146ed00e
chore: slight refactor for async engine finish (#225)
|
11 months ago |
AlpinDale
|
339c6aec53
chore: bump ray version
|
11 months ago |
AlpinDale
|
641bb0f6e9
feat: add custom allreduce kernels (#224)
|
11 months ago |
AlpinDale
|
26a717b49f
fix: use head_dim if available
|
11 months ago |
AlpinDale
|
5053743c1c
feat: speedup AWQ (#223)
|
11 months ago |
AlpinDale
|
c0aac15421
feat: S-LoRA support (#222)
|
11 months ago |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
11 months ago |
AlpinDale
|
3188d5690c
fix: logprobs at -inf (#219)
|
11 months ago |
AlpinDale
|
a39eeb7188
fix: logprobs for dynatemp (#215)
|
11 months ago |
Stefan Gligorijevic
|
9e7e108dc8
chore: clamp dynatemp_min (#214)
|
11 months ago |
AlpinDale
|
60f072ff6f
chore: update klite embed and kcpp version (#212)
|
11 months ago |
AlpinDale
|
97f37c1cb2
fix: use tensor parallel for quantized mixtral (#213)
|
11 months ago |