.. |
attention
|
f40b809d3b
allow using v2 block manager with sliding window
|
7 сар өмнө |
common
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
7 сар өмнө |
distributed
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 сар өмнө |
endpoints
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
7 сар өмнө |
engine
|
6785d78d82
fix: do not expose EOS token in the API
|
7 сар өмнө |
executor
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 сар өмнө |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 жил өмнө |
lora
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 сар өмнө |
modeling
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
7 сар өмнө |
processing
|
f40b809d3b
allow using v2 block manager with sliding window
|
7 сар өмнө |
quantization
|
2649f3f14e
aqlm works on pascal
|
7 сар өмнө |
spec_decode
|
344ddaac5a
properly disable speculative decoding
|
7 сар өмнө |
task_handler
|
e4ea3da1ad
fix: tensor parallel with embedding model
|
7 сар өмнө |
transformers_utils
|
60e74e92fd
add rope_scaling arg
|
7 сар өмнө |
__init__.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
7 сар өмнө |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 жил өмнө |