david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ e13a66925cf2d117b26d0e062e2c62f82dc7f7dd

AlpinDale ddb28a80a3 fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm		6 months ago
..
__init__.py	04b53d2db5 chore: add initializer files	1 year ago
cache_engine.py	5be90c3859 Mamba infrastrucuture support (#586)	6 months ago
cpu_model_runner.py	99680b2d23 feat: soft prompts (#589)	6 months ago
cpu_worker.py	99680b2d23 feat: soft prompts (#589)	6 months ago
embedding_model_runner.py	99680b2d23 feat: soft prompts (#589)	6 months ago
model_runner.py	99680b2d23 feat: soft prompts (#589)	6 months ago
model_runner_base.py	5be90c3859 Mamba infrastrucuture support (#586)	6 months ago
neuron_model_runner.py	4599c98f99 feat: dynamic image size support for VLMs	6 months ago
neuron_worker.py	ae04f57ec1 feat: Pipeline Parallel support (#581)	7 months ago
openvino_model_runner.py	4f7d212b70 feat: remove vision language config	6 months ago
openvino_worker.py	1ff6d4c3d7 feat: support pipeline parallel on indivisible GPU count (#587)	6 months ago
tpu_model_runner.py	1cb06835a0 fix: TPU multimodal kwargs and outlines installation in TPU docker	6 months ago
tpu_worker.py	1ff6d4c3d7 feat: support pipeline parallel on indivisible GPU count (#587)	6 months ago
worker.py	d9f4c36edd feat: Medusa speculative decoding support (#590)	6 months ago
worker_base.py	ddb28a80a3 fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm	6 months ago
xpu_model_runner.py	99680b2d23 feat: soft prompts (#589)	6 months ago
xpu_worker.py	99680b2d23 feat: soft prompts (#589)	6 months ago