david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ de62ceb18c50102764d74085eae83163425842f9

AlpinDale de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead		7 months ago
..
attention	656459fd84 make fp8_e4m3 work on nvidia	7 months ago
common	656459fd84 make fp8_e4m3 work on nvidia	7 months ago
distributed	c58589318f remove the graph mode func	7 months ago
endpoints	fe431bb840 check for next port if current is unavailable	7 months ago
engine	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 months ago
executor	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 months ago
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	1 year ago
lora	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
modeling	656459fd84 make fp8_e4m3 work on nvidia	7 months ago
processing	b7667151e5 fix scheduler being off by one for lora support	7 months ago
quantization	656459fd84 make fp8_e4m3 work on nvidia	7 months ago
spec_decode	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 months ago
task_handler	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 months ago
transformers_utils	60e74e92fd add rope_scaling arg	7 months ago
__init__.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	7 months ago
py.typed	1c988a48b2 fix logging and add py.typed	1 year ago