AlpinDale
|
a1d8ab9f3e
fix: lora on quantized models (barred gguf) (#292)
|
10 hónapja |
AlpinDale
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
10 hónapja |
AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
10 hónapja |
AlpinDale
|
657aec0cbd
refactor: OpenAI endpoint (#261)
|
10 hónapja |
AlpinDale
|
842912d022
feat: on-the-fly gguf conversion (#250)
|
11 hónapja |
AlpinDale
|
ea0f57b233
feat: allow further support for non-cuda devices (#247)
|
11 hónapja |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 éve |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
1 éve |
AlpinDale
|
641bb0f6e9
feat: add custom allreduce kernels (#224)
|
1 éve |
AlpinDale
|
26a717b49f
fix: use head_dim if available
|
1 éve |
AlpinDale
|
c0aac15421
feat: S-LoRA support (#222)
|
1 éve |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
1 éve |
AlpinDale
|
15a0454172
feat: FP8 KV Cache (#185)
|
1 éve |
AlpinDale
|
b9b295d74e
chore: backlogs 1 (#191)
|
1 éve |
AlpinDale
|
7d91e9e0f2
feat: CUDA graphs (#172)
|
1 éve |
AlpinDale
|
725be3e0de
feat: mixtral HF with expert parallelism (#167)
|
1 éve |
AlpinDale
|
35e9cf707c
chore: force pt for mixtral (#164)
|
1 éve |
AlpinDale
|
653da510d1
chore: rewrite InputMetadata (#143)
|
1 éve |
AlpinDale
|
1334a833a4
feat: AMD ROCm support (#95)
|
1 éve |
AlpinDale
|
63c28919a0
Revert "fix: correct auto ntk scaling_factor for 4k ctx case" (#149)
|
1 éve |
AlpinDale
|
2b1ba581f9
feat: re-implement GPTQ (#141)
|
1 éve |
AlpinDale
|
8223f85c1b
feat: SqueezeLLM support (#140)
|
1 éve |
AlpinDale
|
237d2ec28d
fix: CPU OOM for large models (#128)
|
1 éve |
AlpinDale
|
0d51eac374
feat: awq for all models (#124)
|
1 éve |
AlpinDale
|
fd18a1d956
fix: get_tensor instead of pysafeslice
|
1 éve |
AlpinDale
|
5ea6889cea
chore: read from quantization_config (#123)
|
1 éve |
AlpinDale
|
3459f1c185
feat: usage stats for OpenAI endpoint (#122)
|
1 éve |
AlpinDale
|
1323b5456c
parse torch.dtype properly (#119)
|
1 éve |
AlpinDale
|
e7b6a2d5a0
chore: tensor parallel refactors part 2 (#116)
|
1 éve |
AlpinDale
|
5175605f8d
fix: yarn (#112)
|
1 éve |