david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ bitnet

AlpinDale 5b0c11d190 support pipeline parallel pynccl groups		il y a 5 mois
..
__init__.py	2bd6c92f73 fix: lora inclusion in wheels	il y a 11 mois
fully_sharded_layers.py	e87c32bed3 feat: full tensor parallel for LoRA layers (#545)	il y a 6 mois
layers.py	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 6 mois
lora.py	e87c32bed3 feat: full tensor parallel for LoRA layers (#545)	il y a 6 mois
models.py	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 6 mois
punica.py	e87c32bed3 feat: full tensor parallel for LoRA layers (#545)	il y a 6 mois
request.py	5b0c11d190 support pipeline parallel pynccl groups	il y a 5 mois
utils.py	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 6 mois
worker_manager.py	9e73559eba make use of batched rotary embedding kernels to support long context lora	il y a 6 mois