david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 1efd0f89b7351ccfc93cfb0faefa4edd7be5462f

AlpinDale 1efd0f89b7 feat: support FP8 for DeepSeekV2 MoE		6 月之前
..
fused_moe	1efd0f89b7 feat: support FP8 for DeepSeekV2 MoE	6 月之前
mamba	5be90c3859 Mamba infrastrucuture support (#586)	6 月之前
ops	fca911ee0a vLLM Upstream Sync (#526)	8 月之前
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 年之前
activation.py	c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support	7 月之前
layernorm.py	5761ef8c35 feat: gemma-2 support	6 月之前
linear.py	d2f38f6f81 chore: remove separate bias add	6 月之前
logits_processor.py	5761ef8c35 feat: gemma-2 support	6 月之前
pooler.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	7 月之前
rejection_sampler.py	7253e9052d feat: integrate typical acceptance sampling for spec decoding	7 月之前
rotary_embedding.py	5761ef8c35 feat: gemma-2 support	6 月之前
sampler.py	be8154a8a0 feat: proper embeddings API with e5-mistral-7b support	7 月之前
spec_decode_base_sampler.py	7253e9052d feat: integrate typical acceptance sampling for spec decoding	7 月之前
typical_acceptance_sampler.py	7253e9052d feat: integrate typical acceptance sampling for spec decoding	7 月之前
vocab_parallel_embedding.py	0f4a9ee77b quantized lm_head (#582)	6 月之前