david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 93cffaf446430c2267e15593a856947e6e8c6d26

AlpinDale 93cffaf446 add flash_attn back		il y a 7 mois
..
attention	93cffaf446 add flash_attn back	il y a 7 mois
common	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 7 mois
distributed	c58589318f remove the graph mode func	il y a 7 mois
endpoints	fe431bb840 check for next port if current is unavailable	il y a 7 mois
engine	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 7 mois
executor	eaa06fdd14 fix some f-strings	il y a 7 mois
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	il y a 1 an
lora	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 7 mois
modeling	f970f3f3fb add base class for VLMs	il y a 7 mois
processing	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 7 mois
quantization	8e11259e90 missing triton autoconfig for rocm flash attn	il y a 7 mois
spec_decode	236be273e5 feat: tensor parallel speculative decoding (#554)	il y a 7 mois
task_handler	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 7 mois
transformers_utils	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 7 mois
__init__.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	il y a 7 mois
py.typed	1c988a48b2 fix logging and add py.typed	il y a 1 an