PygmalionAI's large-scale inference engine
pygmalion.chat

It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

85 Revīzijas

77 Atzari

36 Laidieni

AlpinDale 16df1763c8 fix: typos in the attention file		1 gadu atpakaļ
aphrodite	16df1763c8 fix: typos in the attention file	1 gadu atpakaļ
assets	fefbf029c9 revert previous commit	1 gadu atpakaļ
examples	c240ac58e0 chore: update openai example	1 gadu atpakaļ
kernels	081545bde6 fix: various CUDA kernel tweaks	1 gadu atpakaļ
.gitignore	16df1763c8 fix: typos in the attention file	1 gadu atpakaļ
LICENSE	b9bcbe5d4e Change license from Apache 2.0 to AGPLv3	1 gadu atpakaļ
README.md	61f4452a92 readme: minor tweak	1 gadu atpakaļ
requirements.txt	188ba960be feat: add openai chat completion	1 gadu atpakaļ
setup.py	d8105984b8 fix: update setuptools again	1 gadu atpakaļ

Breathing Life into Language

Aphrodite is the official backend engine for PygmalionAI. It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to FasterTransformer).

Aphrodite builds upon and integrates the exceptional work from various projects, including:

Please note that Aphrodite is currently in active development and not yet fully functional.

Features

Continuous Batching
Efficient K/V management with PagedAttention
Optimized CUDA kernels for improved inference
Distributed inference
Multiple decoding algorithms (e.g. parallel sampling, beam search)

Requirements

Operating System: Linux
Python: at least 3.8

Supported GPUs

Basically, anything with a compute capability of 7.0 or higher. Here's a full list of supported consumer GPUs:

GPU	CC	GPU	CC	GPU	CC
2060	7.5	2070	7.5	2080	7.5
2080 Ti	7.5	Titan RTX	7.5	1650 Ti	7.5
3060	8.6	3060 Ti	8.6	3070	8.6
3070 Ti	8.6	3080	8.6	3080 Ti	8.6
3090	8.6	3090 Ti	8.6	4070 Ti	8.9
4080	8.9	4090	8.9

CC: Compute Capability

If your GPU isn't listed here, you won't be able to run Aphrodite.

Usage

*Currently not working, but this is how you'd run it once it's fixed.

Clone the repository:

git clone https://github.com/PygmalionAI/aphrodite-engine && cd aphrodite-engine

Install the package:

pip install -r requirements.txt
pip install -e .

Example usage:

from aphrodite import LLM, SamplingParams

prompts = [
"What is a man? A",
"The sun is a wondrous body, like a magnificent",
"All flesh is grass and all the comeliness thereof",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="PygmalionAI/pygmalion-350m")
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Contributing

We accept PRs! There will likely be a few typos or other errors we've failed to catch, so please let us know either via an issue or make a Pull Request.

README.md