david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 93cffaf446430c2267e15593a856947e6e8c6d26

AlpinDale 93cffaf446 add flash_attn back		7 月之前
..
attention	93cffaf446 add flash_attn back	7 月之前
common	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 月之前
distributed	c58589318f remove the graph mode func	7 月之前
endpoints	fe431bb840 check for next port if current is unavailable	7 月之前
engine	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 月之前
executor	eaa06fdd14 fix some f-strings	7 月之前
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	1 年之前
lora	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 月之前
modeling	f970f3f3fb add base class for VLMs	7 月之前
processing	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 月之前
quantization	8e11259e90 missing triton autoconfig for rocm flash attn	7 月之前
spec_decode	236be273e5 feat: tensor parallel speculative decoding (#554)	7 月之前
task_handler	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 月之前
transformers_utils	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 月之前
__init__.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	7 月之前
py.typed	1c988a48b2 fix logging and add py.typed	1 年之前