AlpinDale
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
9 月之前 |
AlpinDale
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
9 月之前 |
AlpinDale
|
76f36af704
feat: LM Format Enforcer support (#428)
|
9 月之前 |
AlpinDale
|
bd0ddf1cfe
feat: EETQ quantization (#408)
|
9 月之前 |
AlpinDale
|
4d33ce60da
feat: Triton flash attention backend for ROCm (#407)
|
9 月之前 |
AlpinDale
|
071269e406
feat: FP8 E4M3 KV Cache (#405)
|
9 月之前 |
AlpinDale
|
6f00203041
refactor scheduler for chunked prefill, remove reorder policy for now
|
9 月之前 |
AlpinDale
|
9aaeb5d349
add speculative config and arg for later
|
9 月之前 |
AlpinDale
|
a304f76d89
feat: Intel CPU support (#403)
|
9 月之前 |
AlpinDale
|
fa083286e3
Speculative Decoding Part 4: Lookahead scheduling (#402)
|
9 月之前 |
AlpinDale
|
f845a661dd
Chunked Prefill Part 2: data update
|
10 月之前 |
AlpinDale
|
eff5eb16c5
ruff
|
10 月之前 |
AlpinDale
|
753f6dc51b
add v2 block manager
|
10 月之前 |
AlpinDale
|
7b9c08afae
vision model support
|
10 月之前 |
AlpinDale
|
b738554558
add reorder scheduler policy
|
10 月之前 |
AlpinDale
|
1ba9ff78cd
add scheduler delay factor
|
10 月之前 |
AlpinDale
|
78d66f16d1
Chunked Prefill Part 1 (#384)
|
10 月之前 |
AlpinDale
|
feb5840f2a
feat: async tokenization (#374)
|
10 月之前 |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
10 月之前 |
AlpinDale
|
c41462cfcd
feat: exllamav2 quantization (#305)
|
10 月之前 |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
11 月之前 |
AlpinDale
|
e0c35bb353
feat: bitsandbytes and `--load-in{4,8}bit` support (#294)
|
11 月之前 |
AlpinDale
|
705821a7fe
feat: AQLM quantization support (#293)
|
11 月之前 |
AlpinDale
|
ac82b67f75
feat: naive context shift and various QoL changes (#289)
|
11 月之前 |
AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
11 月之前 |
AlpinDale
|
657aec0cbd
refactor: OpenAI endpoint (#261)
|
11 月之前 |
AlpinDale
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
11 月之前 |
AlpinDale
|
ea0f57b233
feat: allow further support for non-cuda devices (#247)
|
1 年之前 |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 年之前 |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
1 年之前 |