AlpinDale
|
0bf916eabd
Revert "feat: add support for chunked prefill + prefix caching (#871)"
|
3 weeks ago |
AlpinDale
|
afc9a28aa0
chore: add AphroditeParameter support for FP8 quant (#902)
|
3 weeks ago |
AlpinDale
|
2a60b8f8c9
kernel: do not compile machete for cuda 11 and below (#901)
|
3 weeks ago |
AlpinDale
|
64c05b969a
fix: `ShardedStateLoader` with fp8 quant (#900)
|
3 weeks ago |
AlpinDale
|
132aa2abe4
spec decode: add support for EAGLE (#899)
|
3 weeks ago |
AlpinDale
|
bfc3da41ae
feat: add torch.compile for GemmaRMSNorm (#898)
|
3 weeks ago |
AlpinDale
|
a00ab49e21
api: add client timeouts for the ZeroMQ server (#897)
|
3 weeks ago |
AlpinDale
|
908ff753a1
fix: phi_3.5_v loading (#896)
|
3 weeks ago |
AlpinDale
|
e14223dce5
kernel: use `cub::BlockReduce` instead of custom impl (#895)
|
3 weeks ago |
AlpinDale
|
ff4b7236d5
build: fix invalid path for envs.py in setup (#894)
|
3 weeks ago |
AlpinDale
|
f831fd8312
rocm: fix compile issues with rocm 6.2 (#893)
|
3 weeks ago |
AlpinDale
|
65b71f5fcc
distributed: fix issue for when nodes have multiple network interfaces (#892)
|
3 weeks ago |
AlpinDale
|
653d1a08d4
feat: add support for audio models (#891)
|
3 weeks ago |
AlpinDale
|
22a4cd4595
core: fix spec decode metrics and envs circular import (#889)
|
3 weeks ago |
AlpinDale
|
901900854e
chore: consolidate environment variables within one file (#882)
|
3 weeks ago |
AlpinDale
|
ce6e3d63f7
api: better startup failure UX (#881)
|
3 weeks ago |
AlpinDale
|
db6a50fd5c
async: disable multi-step scheduling for sync engine (#880)
|
3 weeks ago |
AlpinDale
|
afadef06cd
build: pass `PYTHONPATH` from setup.py to cmake (#879)
|
3 weeks ago |
AlpinDale
|
b5aa11020b
api: fix crashes under very high loads (#878)
|
3 weeks ago |
Noah Peterson
|
9fd2bfa02e
readme: fix paged attention hyperlink (#876)
|
3 weeks ago |
AlpinDale
|
f797294b29
fix: `add_generation_template` -> `add_generation_prompt` in llm (#877)
|
3 weeks ago |
AlpinDale
|
f0cc35befe
sampler: pad dry sequence breakers tensor (#875)
|
1 month ago |
AlpinDale
|
9288a98084
spec decoding: set the draft model ctxlen to target model (#874)
|
1 month ago |
AlpinDale
|
55b7ce56c1
cpu: fix `mm_limits` initialization (#873)
|
1 month ago |
AlpinDale
|
5bd4473bb6
async: avoid premature exit in the async generator (#872)
|
1 month ago |
AlpinDale
|
abfd4465ca
feat: add support for chunked prefill + prefix caching (#871)
|
1 month ago |
AlpinDale
|
ef99a567b6
fix: temp_last warning being repeated for every output token (#869)
|
1 month ago |
Naomiusearch
|
4f9fea4c4d
fix: ROCm build (#817)
|
1 month ago |
50h100a
|
9b569279fd
Merge pull request #868 from PygmalionAI/dry_zoom
|
1 month ago |
50h100a
|
fc3c1cd5a5
this is getting its own commit because lint failures like that are exactly why people stop using linters
|
1 month ago |