AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
10 months ago |
AlpinDale
|
769b069e2e
AttributeError fix in OpenAI server
|
10 months ago |
AlpinDale
|
23a7fd8cda
remove ooba endpoint, fix and add deprecation warning for kobold endpoint, fix case where kobold endpoint was always launched with openai (#284)
|
10 months ago |
AlpinDale
|
13d850334e
fix: navi support (#283)
|
10 months ago |
AlpinDale
|
9fa99215f8
feat: add cubic sampling (#280)
|
10 months ago |
AlpinDale
|
657aec0cbd
refactor: OpenAI endpoint (#261)
|
10 months ago |
50h100a
|
3df36ee07d
fix: logit bias logitproc (#278)
|
10 months ago |
AlpinDale
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
10 months ago |
Stefan Daniel Schwarz
|
810ca83066
fix+feat: docker compose (#264)
|
10 months ago |
AlpinDale
|
16615784b3
fix: prefix cache for turing gpus
|
10 months ago |
AlpinDale
|
7dc73a779a
fix: properly perform garbage collection for lora (#277)
|
10 months ago |
AlpinDale
|
697c06c4f5
fix: LoRA support for mixtral (#276)
|
10 months ago |
AlpinDale
|
4b80b42362
fix: memory leaks due to nccl cuda graphs (#275)
|
10 months ago |
AlpinDale
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
10 months ago |
AlpinDale
|
7d6ba53602
feat: fused top-k kernels for MoE (#273)
|
10 months ago |
AlpinDale
|
a3cab09b69
chore: logging env variable
|
10 months ago |
AlpinDale
|
2c08aa5af4
chore: remove eos token from output (#272)
|
10 months ago |
AlpinDale
|
8e1cd54497
fix: do not include fp8 for rocm (#271)
|
10 months ago |
AlpinDale
|
6a63ab4ec3
fix: remote encode request if using ray (#270)
|
10 months ago |
AlpinDale
|
224b87b484
feat: add fused mixtral moe support (#238)
|
10 months ago |
Thomas Xin
|
43cf0e98a0
fix: worker initialization on WSL (#260)
|
10 months ago |
swadical
|
0527131e93
fix: grammar logits processor (#268)
|
10 months ago |
AlpinDale
|
2370dbcfd8
feat: OPT model support (#266)
|
10 months ago |
AlpinDale
|
4360684667
fix: cuda version in wheel
|
11 months ago |
TearGosling
|
80e8a14949
feat: add pygchat Jinja template (#218)
|
11 months ago |
sgsdxzy
|
fe7844f2ef
feat: sharding and safetensors support for gguf conversion (#256)
|
11 months ago |
AlpinDale
|
55c7a22ca6
add t5 modeling code
|
11 months ago |
AlpinDale
|
8635901c76
fix: s-lora vocab embeddings
|
11 months ago |
AlpinDale
|
c76b611021
docker: update the Dockerfile and push the latest image (#254)
|
11 months ago |
anon998
|
35b9033782
fix: crash in quadratic sampling when batch > 1 (#253)
|
11 months ago |