david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 93cffaf446430c2267e15593a856947e6e8c6d26

AlpinDale 93cffaf446 add flash_attn back		há 7 meses atrás
..
attention	93cffaf446 add flash_attn back	há 7 meses atrás
common	9e73559eba make use of batched rotary embedding kernels to support long context lora	há 7 meses atrás
distributed	c58589318f remove the graph mode func	há 7 meses atrás
endpoints	fe431bb840 check for next port if current is unavailable	há 7 meses atrás
engine	9e73559eba make use of batched rotary embedding kernels to support long context lora	há 7 meses atrás
executor	eaa06fdd14 fix some f-strings	há 7 meses atrás
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	há 1 ano atrás
lora	9e73559eba make use of batched rotary embedding kernels to support long context lora	há 7 meses atrás
modeling	f970f3f3fb add base class for VLMs	há 7 meses atrás
processing	9e73559eba make use of batched rotary embedding kernels to support long context lora	há 7 meses atrás
quantization	8e11259e90 missing triton autoconfig for rocm flash attn	há 7 meses atrás
spec_decode	236be273e5 feat: tensor parallel speculative decoding (#554)	há 7 meses atrás
task_handler	9e73559eba make use of batched rotary embedding kernels to support long context lora	há 7 meses atrás
transformers_utils	9e73559eba make use of batched rotary embedding kernels to support long context lora	há 7 meses atrás
__init__.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	há 7 meses atrás
py.typed	1c988a48b2 fix logging and add py.typed	há 1 ano atrás