david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 172dee25731746a9e8f916ba2d4a8c6b71c16f02

AlpinDale 3e6addcc2c LLM: enable batched inference for llm.chat() API (#1120)		hai 5 días
..
audio	8eb4a3cfd3 vlm: support multiple audios per prompt for Ultravox (#990)	hai 1 mes
chat_templates	485d1de42e fix: hermes tool call chat template (#999)	hai 1 mes
fp8	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 5 meses
marlin	8a71788372 Add OLMoE (#772)	hai 4 meses
monitoring	b26a014b12 fix: prometheus.yaml path in monitoring example (#969)	hai 1 mes
offline_inference	3e6addcc2c LLM: enable batched inference for llm.chat() API (#1120)	hai 5 días
openai_api	313e198557 api: implement OpenAI-compatible tools API for Hermes/Mistral models (#993)	hai 1 mes
vision	a5bfc2bc3d VLM: add support for LLaVA-Onevision model (#1100)	hai 2 semanas
aphrodite_engine_example.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 5 meses
api_client.py	f32d57ed04 add inference examples	hai 1 ano
gguf_to_torch.py	9d81716bfd [v0.5.3] Release Candidate (#388)	hai 9 meses
gradio_server.py	e42a78381a feat: switch from pylint to ruff (#322)	hai 11 meses
run_cluster.sh	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 5 meses
save_sharded_state.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	hai 5 meses
tensorize_aphrodite_model.py	22a4cd4595 core: fix spec decode metrics and envs circular import (#889)	hai 1 mes
xqa_attn.py	949f974c59 (1/N) XQA: integrate the XQA CUDA kernels within Aphrodite (#1115)	hai 1 semana