david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

AlpinDale 145e554a4d neuron: add 8bit quantization for Neuron (#994)		há 2 semanas atrás
..
arctic_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
cached_prefix_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
embedding_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
encoder_decoder_inference.py	62111fab17 feat: allow serving encoder-decoder models in the API server (#664)	há 4 meses atrás
gguf_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
lora_aphrodite_engine.py	673621a3d2 xpu: refactor the model runner for tensor parallelism (#910)	há 3 semanas atrás
lora_async_aphrodite.py	673621a3d2 xpu: refactor the model runner for tensor parallelism (#910)	há 3 semanas atrás
mlpspeculator_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
neuron_inference.py	ba6d798784 neuron: support for context length and token bucketing (#960)	há 2 semanas atrás
neuron_int8_quantization.py	145e554a4d neuron: add 8bit quantization for Neuron (#994)	há 2 semanas atrás
offline_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
ray_distributed_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
soft_prompt_inference.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	há 4 meses atrás
tpu_inference.py	436d8fa0f1 core: do not compile for profiling (#931)	há 2 semanas atrás