PygmalionAI's large-scale inference engine
pygmalion.chat

It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

65 Commits

79 Branches

36 Publications

AlpinDale 0715cc1958 fix: typo in llama modeling file		il y a 1 an
aphrodite	0715cc1958 fix: typo in llama modeling file	il y a 1 an
assets	fefbf029c9 revert previous commit	il y a 1 an
kernels	5e82533d02 upstream: add option to specify tokenizer	il y a 1 an
.gitignore	07aa2a492f upstream: add option to specify tokenizer	il y a 1 an
LICENSE	b9bcbe5d4e Change license from Apache 2.0 to AGPLv3	il y a 1 an
README.md	e75063713a fix: typo in README	il y a 1 an
requirements.txt	c25ee56847 chore: change epsilon variance value	il y a 1 an

Breathing Life into Language

Aphrodite is the official backend engine for PygmalionAI. It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to FasterTransformer).

Aphrodite builds upon and integrates the exceptional work from various projects, including:

Please note that Aphrodite is currently in active development and not yet fully functional.

Features

Continuous Batching
Efficient K/V management with PagedAttention
Optimized CUDA kernels for improved inference
Distributed inference
Multiple decoding algorithms (e.g. parallel sampling, beam search)

Requirements

You will likely need a CUDA version of at least 11.0, and a Compute Capability of at least 7, 0. CUDA 12.0 is unsupported, so please switch to 11.8!

Linux-only

Contributing

We accept PRs! There will likely be a few typos or other errors we've failed to catch, so please let us know either via an issue or make a Pull Request.

README.md

Breathing Life into Language

Please note that Aphrodite is currently in active development and not yet fully functional.

Features

Requirements

Contributing