AlpinDale
|
7d91e9e0f2
feat: CUDA graphs (#172)
|
1 year ago |
AlpinDale
|
653da510d1
chore: rewrite InputMetadata (#143)
|
1 year ago |
AlpinDale
|
1334a833a4
feat: AMD ROCm support (#95)
|
1 year ago |
AlpinDale
|
2b1ba581f9
feat: re-implement GPTQ (#141)
|
1 year ago |
AlpinDale
|
8223f85c1b
feat: SqueezeLLM support (#140)
|
1 year ago |
AlpinDale
|
1aab8a7d6f
feat: speedup compilation times by 3x (#130)
|
1 year ago |
AlpinDale
|
237d2ec28d
fix: CPU OOM for large models (#128)
|
1 year ago |
AlpinDale
|
8834ecf9de
chore: clean up refactor endpoints (#98)
|
1 year ago |
AlpinDale
|
efc6f7fbec
chore: reformats (#90)
|
1 year ago |
AlpinDale
|
3d72f05c7b
feat: flattened 1D tensor -> 2D tensor (#85)
|
1 year ago |
AlpinDale
|
0495c50a3e
GPTQ+exllama support (#21)
|
1 year ago |
AlpinDale
|
69a4c32b01
fix: openai server (#19)
|
1 year ago |
AlpinDale
|
cbeeabeb9a
feat: mistral support (#20)
|
1 year ago |
AlpinDale
|
8576f8c1f8
fix ctxlen issues with large prompts
|
1 year ago |
AlpinDale
|
ca43123a30
add github action to auto-build wheels
|
1 year ago |
AlpinDale
|
75c27d3e65
massive overhaul
|
1 year ago |
AlpinDale
|
4cdf165ee9
fix engine args
|
1 year ago |
AlpinDale
|
d9c1d4f6e5
add awq support
|
1 year ago |
AlpinDale
|
39beed0b87
Revert "Refactor AWQ support."
|
1 year ago |
AlpinDale
|
d09e27f5d4
Refactor AWQ support.
|
1 year ago |
AlpinDale
|
6b9561ef07
adapt TGI incremental detokenization
|
1 year ago |
AlpinDale
|
d4cd18bd94
chore: allow user to specify model context length
|
1 year ago |
AlpinDale
|
0115e55972
chore: add max log length
|
1 year ago |
AlpinDale
|
45f6d9f923
initial refactor commit
|
1 year ago |
AlpinDale
|
76b2e4a445
Merge dev branch into main (#7)
|
1 year ago |
AlpinDale
|
97bb098066
fix: typo lol
|
1 year ago |
AlpinDale
|
f4bb602b74
chore: remove redundant import and minor refactor
|
1 year ago |
AlpinDale
|
56077f0f29
upstream: trust remote code
|
1 year ago |
AlpinDale
|
7a27bd5f2f
fix: do not allow prompt to exceed max input len
|
1 year ago |
AlpinDale
|
5169163403
chore: add tokenizer mode for slow/fast tokenizers
|
1 year ago |