PygmalionAI's large-scale inference engine
pygmalion.chat
It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).
|
il y a 1 an | |
---|---|---|
aphrodite | il y a 1 an | |
assets | il y a 1 an | |
examples | il y a 1 an | |
kernels | il y a 1 an | |
.gitignore | il y a 1 an | |
LICENSE | il y a 1 an | |
README.md | il y a 1 an | |
requirements.txt | il y a 1 an | |
setup.py | il y a 1 an | |
test.py | il y a 1 an |
Aphrodite is the official backend engine for PygmalionAI. It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to FasterTransformer).
Aphrodite builds upon and integrates the exceptional work from various projects, including:
Basically, anything with a compute capability of 7.0 or higher. Here's a full list of supported consumer GPUs:
GPU | CC | GPU | CC | GPU | CC |
---|---|---|---|---|---|
2060 | 7.5 | 2070 | 7.5 | 2080 | 7.5 |
2080 Ti | 7.5 | Titan RTX | 7.5 | 1650 Ti | 7.5 |
3060 | 8.6 | 3060 Ti | 8.6 | 3070 | 8.6 |
3070 Ti | 8.6 | 3080 | 8.6 | 3080 Ti | 8.6 |
3090 | 8.6 | 3090 Ti | 8.6 | 4070 Ti | 8.9 |
4080 | 8.9 | 4090 | 8.9 |
* CC: Compute Capability
Most datacenter/workstation GPUs are supported, so long as they have a compute capability of 7.0 or higher.
If you're unsure, you can find out by opening a Python interpreter and running:
>>> import torch
>>> print(torch.cuda.get_device_capability())
This should print something like this: (7, 5)
, which would indicate a CC of 7.5
If your GPU is not listed here or you do not meet the minimum CC, you will not be able to run Aphrodite.
Aphrodite will require a slightly specialized environment to run, as the latest CUDA and GCC versions are not supported. You can use Conda to easily configure your environment.
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash ./Miniconda3*
You can follow the on-screen instructions, though you may want to set the installation directory to somewhere with a large empty storage space.
You can either source your shell script (. ~/.bashrc
or . ~/.zshrc
) or restart your terminal instance to begin using conda.
$ conda config --set auto_activate_base false
$ conda create -n aphrodite python=3.9
$ conda activate aphrodite
$ conda install -c conda-forge cudatoolkit-dev gcc=11.3 gxx=11.3
The last command will take a long time, depending on your internet speed.
Whenever you want to launch Aphrodite later on, make sure you run conda activate aphrodite
first. The other steps outlined above are one-time only.
Clone the repository:
git clone https://github.com/PygmalionAI/aphrodite-engine && cd aphrodite-engine
Install the package:
pip install -e .
If you receive any import errors here, try running
pip install -r requirements.txt
first.
If you receive an error for CUDA version mismatch, run which nvcc
and note down the output. For example, if your output is /home/anon/miniconda3/envs/aphrodite/bin/nvcc
, run this command:
$ export CUDA_HOME=/home/anon/miniconda3/envs/aphrodite
Then run the installation command again.
LLM
from aphrodite import LLM, SamplingParams
prompts = [
"What is a man? A",
"The sun is a wondrous body, like a magnificent",
"All flesh is grass and all the comeliness thereof",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="EleutherAI/pythia-70m") # you can also use a local directory path
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
$ python -m python -m aphrodite.endpoints.openai.api_server --model EleutherAI/pythia-70m
$ curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "EleutherAI/pythia-70m",
"prompt": "What is a man? A",
"max_tokens": 512,
"n": 2048,
"temperature": 0.8
}'
For the full list of request parameters, see OpenAI Completions API reference.
We accept PRs! There will likely be a few typos or other errors we've failed to catch, so please let us know either via an issue or make a Pull Request.