david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ da6765c0847789cdcbf97878d78f01b544cf525e

AlpinDale dfa59bc5f9 fix: 16 GPUs in a cluster		7 meses atrás
..
__init__.py	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	11 meses atrás
cpu_executor.py	ef733aee43 implement ExecuteModelData to reduce executor complexity	7 meses atrás
distributed_gpu_executor.py	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 meses atrás
executor_base.py	de62ceb18c refactor: eliminate parallel worker per-step task scheduling overhead	7 meses atrás
gpu_executor.py	236be273e5 feat: tensor parallel speculative decoding (#554)	7 meses atrás
multiproc_gpu_executor.py	a89c9a0e92 fix: device ordinal issues with world_size and stuff	7 meses atrás
multiproc_worker_utils.py	fa58ba87a3 fix: only set executor backend to mp if not multi-node	7 meses atrás
neuron_executor.py	ef733aee43 implement ExecuteModelData to reduce executor complexity	7 meses atrás
ray_gpu_executor.py	dfa59bc5f9 fix: 16 GPUs in a cluster	7 meses atrás
ray_utils.py	6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571)	7 meses atrás
ray_xpu_executor.py	6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571)	7 meses atrás
tpu_executor.py	fe21123a1c feat: TPU support (#570)	7 meses atrás
xpu_executor.py	6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571)	7 meses atrás