david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

AlpinDale 6212072245 api: support LoRA lineage and base model metadata management (#1072)		4 days ago
..
__init__.py	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	9 months ago
cpu_executor.py	6212072245 api: support LoRA lineage and base model metadata management (#1072)	4 days ago
distributed_gpu_executor.py	0dfa6b60ec core: support logprobs with multi-step scheduling (#963)	2 weeks ago
executor_base.py	0dfa6b60ec core: support logprobs with multi-step scheduling (#963)	2 weeks ago
gpu_executor.py	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 weeks ago
msgspec_utils.py	2f61644f6e SPMD optimizations (#824)	1 month ago
multiproc_gpu_executor.py	638c08d9dc fix: clean shutdown issues (#1047)	1 week ago
multiproc_worker_utils.py	9a7d5514c4 feat: introduce MQAphroditeEngine (#1056)	1 week ago
multiproc_xpu_executor.py	15cb8d5c26 xpu: support pipeline parallel (#932)	2 weeks ago
neuron_executor.py	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 weeks ago
openvino_executor.py	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 weeks ago
ray_gpu_executor.py	4737c22ab3 fix: pass `APHRODITE_ATTENTION_BACKEND` to ray workers (#1009)	1 week ago
ray_tpu_executor.py	6212072245 api: support LoRA lineage and base model metadata management (#1072)	4 days ago
ray_utils.py	6212072245 api: support LoRA lineage and base model metadata management (#1072)	4 days ago
ray_xpu_executor.py	673621a3d2 xpu: refactor the model runner for tensor parallelism (#910)	3 weeks ago
tpu_executor.py	4b1b658855 tpu: implement multi-step scheduling (#1046)	1 week ago
xpu_executor.py	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 weeks ago