david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 93cffaf446430c2267e15593a856947e6e8c6d26

AlpinDale 93cffaf446 add flash_attn back		7 months ago
..
attention	93cffaf446 add flash_attn back	7 months ago
common	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
distributed	c58589318f remove the graph mode func	7 months ago
endpoints	fe431bb840 check for next port if current is unavailable	7 months ago
engine	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
executor	eaa06fdd14 fix some f-strings	7 months ago
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	1 year ago
lora	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
modeling	f970f3f3fb add base class for VLMs	7 months ago
processing	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
quantization	8e11259e90 missing triton autoconfig for rocm flash attn	7 months ago
spec_decode	236be273e5 feat: tensor parallel speculative decoding (#554)	7 months ago
task_handler	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
transformers_utils	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 months ago
__init__.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	7 months ago
py.typed	1c988a48b2 fix logging and add py.typed	1 year ago