AlpinDale
|
c577c31aaa
feat: tree attention
|
8 months ago |
AlpinDale
|
b5a730d069
fix: options requests in the api (#439)
|
8 months ago |
AlpinDale
|
f216601f18
fix: logging in the API server
|
8 months ago |
Pyroserenus
|
3de1c23a0e
chore: update Kobold Lite Embed (#433)
|
8 months ago |
sgsdxzy
|
b28011e86e
fix: shard exl2 weights more evenly between ranks (#437)
|
8 months ago |
sgsdxzy
|
a3b1602391
fix: rope scaling for cohere and qwen (#436)
|
8 months ago |
AlpinDale
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
8 months ago |
AlpinDale
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
8 months ago |
sgsdxzy
|
0d0c6b313c
fix: linear bias of qkv layers in models (#430)
|
8 months ago |
sgsdxzy
|
1528ce50e5
fix: abort requests when the connection to /v1/completions is interrupted (#431)
|
8 months ago |
AlpinDale
|
76f36af704
feat: LM Format Enforcer support (#428)
|
8 months ago |
AlpinDale
|
4d338a38c2
fix engine_use_ray=True
|
8 months ago |
AlpinDale
|
140ebac03e
fix the nsight profiling with ray
|
8 months ago |
AlpinDale
|
8d26cf3876
simplify model_executor logic
|
8 months ago |
AlpinDale
|
f9f13a348d
fix CPU blocks logger for CPU backend
|
8 months ago |
AlpinDale
|
73143eaea5
suppress import error for eetq
|
8 months ago |
50h100a
|
f67b5be198
chore: port sampler+metadata changes from main to dev (#427)
|
8 months ago |
sgsdxzy
|
58b0616dd3
feat: support sharded ggufs (#420)
|
8 months ago |
sgsdxzy
|
589fe0c73e
fix: split the exl2 weight loading and SQ+ init (#423)
|
8 months ago |
AlpinDale
|
f1accfac9f
add CLI app
|
8 months ago |
AlpinDale
|
309339ffd3
separate api server args into another file
|
8 months ago |
sgsdxzy
|
f3b546e33a
feat: upport twe lm_head for quantized weights (#409)
|
8 months ago |
sgsdxzy
|
214151b04c
fix: max_num_batched_tokens for chunked_prefill (#412)
|
8 months ago |
AlpinDale
|
1dccb03b17
incorrect comparison for hadamard and punica checks
|
8 months ago |
sgsdxzy
|
6a0a6360f1
fix: Allow setting config-path when converting ggufs. (#410)
|
8 months ago |
sgsdxzy
|
fcfb72af24
Support arbitrary model in GGUF. (#381)
|
8 months ago |
AlpinDale
|
bd0ddf1cfe
feat: EETQ quantization (#408)
|
8 months ago |
AlpinDale
|
b1caee23a6
cache the p2p access check for memory saving
|
8 months ago |
AlpinDale
|
373e0d3c01
fix neuron
|
8 months ago |
AlpinDale
|
28bcca2396
incorrect use of monotonic time in metrics logger
|
8 months ago |