.. |
attention
|
c577c31aaa
feat: tree attention
|
8 months ago |
common
|
c577c31aaa
feat: tree attention
|
8 months ago |
distributed
|
b1caee23a6
cache the p2p access check for memory saving
|
8 months ago |
endpoints
|
c577c31aaa
feat: tree attention
|
8 months ago |
engine
|
c577c31aaa
feat: tree attention
|
8 months ago |
executor
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
8 months ago |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
9 months ago |
lora
|
fe17712f29
fully working chunked prefill
|
8 months ago |
modeling
|
b28011e86e
fix: shard exl2 weights more evenly between ranks (#437)
|
8 months ago |
processing
|
c577c31aaa
feat: tree attention
|
8 months ago |
spec_decode
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
8 months ago |
task_handler
|
c577c31aaa
feat: tree attention
|
8 months ago |
transformers_utils
|
58b0616dd3
feat: support sharded ggufs (#420)
|
8 months ago |
__init__.py
|
c2aaaefd57
allow out-of-tree model registry
|
8 months ago |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 year ago |