AlpinDale
|
9bbc75d2e3
wip
|
5 months ago |
AlpinDale
|
60af35bc34
wip
|
5 months ago |
AlpinDale
|
74cb1aad4e
wip
|
5 months ago |
AlpinDale
|
5d965b34a7
bitnet -> bitblas in reqs
|
5 months ago |
AlpinDale
|
5884e0b904
add bitnetforcausallm support
|
5 months ago |
AlpinDale
|
2649f3f14e
aqlm works on pascal
|
5 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
5 months ago |
AlpinDale
|
344ddaac5a
properly disable speculative decoding
|
5 months ago |
AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
5 months ago |
AlpinDale
|
0d15aa3ab3
fix prefix caching for block manager v2
|
5 months ago |
AlpinDale
|
7d0884de9a
fix mistral v0.3 weight loading
|
5 months ago |
AlpinDale
|
e8b7f53321
allow prompt token IDs in the logits processor api
|
5 months ago |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
5 months ago |
Drake
|
e1a142c179
Fix OpenAI chat completions compatibility (#559)
|
5 months ago |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
5 months ago |
AlpinDale
|
f6250c5516
move dockerfiles to root; fix cpu build
|
5 months ago |
AlpinDale
|
d8667fcb98
improve gptq_marlin_24 prefill performance
|
5 months ago |
AlpinDale
|
eb2c5c77df
feat: enforce the max possible seqlen
|
5 months ago |
AlpinDale
|
19a959a03e
prioritize user selection for attention
|
5 months ago |
AlpinDale
|
c1ed789835
fix: typo in llama.py
|
5 months ago |
AlpinDale
|
4e1ae004da
make mp the default distributed backend
|
5 months ago |
AlpinDale
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
5 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
5 months ago |
AlpinDale
|
6e626b902c
fix cutlass w8a8 kernels for cuda stream
|
5 months ago |
AlpinDale
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
5 months ago |
AlpinDale
|
04d22bf1a9
add clang-format
|
5 months ago |
AlpinDale
|
60e74e92fd
add rope_scaling arg
|
5 months ago |
AlpinDale
|
b8b63eb5ca
fix head_size check for flash attention backend
|
5 months ago |
AlpinDale
|
8077af0b2f
add lora support for phi
|
5 months ago |
AlpinDale
|
295cfb2f39
add rope scaling for qwen2
|
5 months ago |