david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 0f4a9ee77be8a5a4a58d8a13d66eaab4e42bcf9e

AlpinDale 0f4a9ee77b quantized lm_head (#582)		6 months ago
..
attention	ca6b69966d fix: explicitly end_forward() calls to flashinfer	7 months ago
common	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
distributed	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
endpoints	63b735bc2a chore: optimize v2 block manager to match the performance of v1	6 months ago
engine	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
executor	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
inputs	3a0fdf7b9b chore: remove `image_input_type` from VLM config	6 months ago
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	1 year ago
lora	0f4a9ee77b quantized lm_head (#582)	6 months ago
modeling	0f4a9ee77b quantized lm_head (#582)	6 months ago
multimodal	dd378ea063 feat: MLPSpeculator with tensor parallel	6 months ago
processing	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
quantization	0f4a9ee77b quantized lm_head (#582)	6 months ago
spec_decode	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
task_handler	ae04f57ec1 feat: Pipeline Parallel support (#581)	6 months ago
transformers_utils	3a0fdf7b9b chore: remove `image_input_type` from VLM config	6 months ago
__init__.py	a07fc83bc8 chore: proper util for aphrodite version	7 months ago
_custom_ops.py	c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support	7 months ago
_ipex_ops.py	6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571)	7 months ago
py.typed	1c988a48b2 fix logging and add py.typed	1 year ago
version.py	7e54c3916d chore: factor out epilogues from cutlass kernels	7 months ago