PygmalionAI's large-scale inference engine
pygmalion.chat

It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

32 Revīzijas

79 Atzari

36 Laidieni

AlpinDale d40a8d6bb0 chore: bind single_query_cached_kv_attention to python		1 gadu atpakaļ
aphrodite	b48fe85378 chore: utilities for modeling	1 gadu atpakaļ
assets	fefbf029c9 revert previous commit	1 gadu atpakaļ
kernels	d40a8d6bb0 chore: bind single_query_cached_kv_attention to python	1 gadu atpakaļ
.gitignore	3c3944153c feat: add generic attention and FP32 dtype kernels	1 gadu atpakaļ
LICENSE	fefbf029c9 revert previous commit	1 gadu atpakaļ
README.md	fefbf029c9 revert previous commit	1 gadu atpakaļ
requirements.txt	fefbf029c9 revert previous commit	1 gadu atpakaļ

Aphrodite - The Pygmalion Backend

Work in Progress

Aphrodite is the backend service for PygmalionAI, built on top of FastChat, vLLM, SkyPilot, and more.

Currently a work in progress, not remotely functional.

README.md

Aphrodite - The Pygmalion Backend

Work in Progress