david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ tools-api

AlpinDale 008e646c7e chore: add support for up to 2048 block size (#715)		hai 4 meses
..
__init__.py	04b53d2db5 chore: add initializer files	hai 1 ano
cache_engine.py	bf88c8567e feat: mamba model support (#674)	hai 4 meses
cpu_model_runner.py	bf88c8567e feat: mamba model support (#674)	hai 4 meses
cpu_worker.py	bf88c8567e feat: mamba model support (#674)	hai 4 meses
embedding_model_runner.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 4 meses
enc_dec_model_runner.py	0b8b407b6d feat: support profiling with multiple multi-modal inputs per prompt (#712)	hai 4 meses
model_runner.py	0b8b407b6d feat: support profiling with multiple multi-modal inputs per prompt (#712)	hai 4 meses
model_runner_base.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 4 meses
neuron_model_runner.py	008e646c7e chore: add support for up to 2048 block size (#715)	hai 4 meses
neuron_worker.py	008e646c7e chore: add support for up to 2048 block size (#715)	hai 4 meses
openvino_model_runner.py	bf88c8567e feat: mamba model support (#674)	hai 4 meses
openvino_worker.py	bf88c8567e feat: mamba model support (#674)	hai 4 meses
tpu_model_runner.py	1c519cc6ac chore: set per-rank XLA cache for TPU (#714)	hai 4 meses
tpu_worker.py	1c519cc6ac chore: set per-rank XLA cache for TPU (#714)	hai 4 meses
utils.py	a0e446a17d feat: initial encoder-decoder support with BART model (#633)	hai 4 meses
worker.py	b03fa02397 refactor: base worker input refactor for multi-step (#683)	hai 4 meses
worker_base.py	f76f2a5af0 feat: add aphrodite plugin system (#705)	hai 4 meses
xpu_model_runner.py	0b8b407b6d feat: support profiling with multiple multi-modal inputs per prompt (#712)	hai 4 meses
xpu_worker.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 4 meses