AlpinDale de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead 7 months ago
..
attention 656459fd84 make fp8_e4m3 work on nvidia 7 months ago
common 656459fd84 make fp8_e4m3 work on nvidia 7 months ago
distributed c58589318f remove the graph mode func 7 months ago
endpoints fe431bb840 check for next port if current is unavailable 7 months ago
engine de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead 7 months ago
executor de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead 7 months ago
kv_quant e42a78381a feat: switch from pylint to ruff (#322) 1 year ago
lora 9e73559eba make use of batched rotary embedding kernels to support long context lora 7 months ago
modeling 656459fd84 make fp8_e4m3 work on nvidia 7 months ago
processing b7667151e5 fix scheduler being off by one for lora support 7 months ago
quantization 656459fd84 make fp8_e4m3 work on nvidia 7 months ago
spec_decode de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead 7 months ago
task_handler de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead 7 months ago
transformers_utils 60e74e92fd add rope_scaling arg 7 months ago
__init__.py be8154a8a0 feat: proper embeddings API with e5-mistral-7b support 7 months ago
py.typed 1c988a48b2 fix logging and add py.typed 1 year ago