.. |
attention
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
common
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
distributed
|
c58589318f
remove the graph mode func
|
7 months ago |
endpoints
|
fe431bb840
check for next port if current is unavailable
|
7 months ago |
engine
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 months ago |
executor
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 months ago |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 year ago |
lora
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
7 months ago |
modeling
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
processing
|
b7667151e5
fix scheduler being off by one for lora support
|
7 months ago |
quantization
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
spec_decode
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 months ago |
task_handler
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 months ago |
transformers_utils
|
60e74e92fd
add rope_scaling arg
|
7 months ago |
__init__.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
7 months ago |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 year ago |