david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 517676249c8a5e3cb77b52ac69d646ef1ae90939

AlpinDale 6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer		hace 7 meses
..
__init__.py	04b53d2db5 chore: add initializer files	hace 1 año
cache_engine.py	f40b809d3b allow using v2 block manager with sliding window	hace 7 meses
cpu_model_runner.py	8d77c69cbd feat: support image processor and add llava example	hace 7 meses
cpu_worker.py	50b7c13db0 refactor: attention selector (#552)	hace 8 meses
embedding_model_runner.py	8d77c69cbd feat: support image processor and add llava example	hace 7 meses
model_runner.py	6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer	hace 7 meses
neuron_model_runner.py	35ae01d7ba refactor: attention metadata term	hace 8 meses
neuron_worker.py	fca911ee0a vLLM Upstream Sync (#526)	hace 8 meses
worker.py	eb2c5c77df feat: enforce the max possible seqlen	hace 7 meses
worker_base.py	7194047318 remove vllm-nccl	hace 7 meses