david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 93cffaf446430c2267e15593a856947e6e8c6d26

AlpinDale 9e73559eba make use of batched rotary embedding kernels to support long context lora		7 mesi fa
..
__init__.py	04b53d2db5 chore: add initializer files	1 anno fa
cache_engine.py	50b7c13db0 refactor: attention selector (#552)	8 mesi fa
cpu_model_runner.py	a94de94c44 refactor: combine the prefill and decode into a single API (#553)	7 mesi fa
cpu_worker.py	50b7c13db0 refactor: attention selector (#552)	8 mesi fa
embedding_model_runner.py	a94de94c44 refactor: combine the prefill and decode into a single API (#553)	7 mesi fa
model_runner.py	9e73559eba make use of batched rotary embedding kernels to support long context lora	7 mesi fa
neuron_model_runner.py	35ae01d7ba refactor: attention metadata term	8 mesi fa
neuron_worker.py	fca911ee0a vLLM Upstream Sync (#526)	8 mesi fa
worker.py	236be273e5 feat: tensor parallel speculative decoding (#554)	7 mesi fa
worker_base.py	ef733aee43 implement ExecuteModelData to reduce executor complexity	8 mesi fa