david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ mqaphrodite

AlpinDale b47a39026d feat: introduce MQAphroditeEngine		1 week ago
..
async_aphrodite	055c8905a3 api: add sampling/engine option to return only deltas or final output (#1035)	1 week ago
basic_correctness	304e1e5a8a core: dump model runner inputs during crash (#1023)	1 week ago
benchmarks	b47a39026d feat: introduce MQAphroditeEngine	1 week ago
compile	239a8cae25 torch.compile: register all-reduce operations as custom ops (#1050)	1 week ago
core	f7f3fed265 feat: add async postprocessor (#925)	2 weeks ago
distributed	4d14bd1fe5 vlm: add multi-input support for LLaVA and InternVL models (#1002)	2 weeks ago
encoder_decoder	a985143768 core: add cuda graph support for encoder-decoder models (#1051)	1 week ago
endpoints	f644e10449 vlm: enable multimodal inputs for the LLM class (#992)	2 weeks ago
engine	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 weeks ago
kernels	9bdf8d5bfa mamba: enable continuous batching for mamba kernels (#1055)	1 week ago
lora	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 weeks ago
metrics	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
modeling	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
models	8d5d87e687 vlm: support multiple images for qwen-vl (#1031)	1 week ago
multi_step	0dfa6b60ec core: support logprobs with multi-step scheduling (#963)	2 weeks ago
multimodal	2aabf8fcf7 vlm: fix errors on ragged NestedTensors (#953)	2 weeks ago
plugins	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
prefix_caching	3d83e64f8e feat: add metrics for prefix cache hit rate (#829)	1 month ago
prompt_adapter	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
prompts	e1f3fd1e02 fix: test units (#201)	1 year ago
quantization	6bdff60aab quant: support pre-quanted bitsandbytes checkpoints (#961)	2 weeks ago
samplers	2150bb5019 sampler: add range parameter for DRY (#855)	1 month ago
spec_decode	0859dc3bc0 tests: refactor speculative decoding tests to remove the async engine (#1021)	1 week ago
tensorizer_loader	673621a3d2 xpu: refactor the model runner for tensor parallelism (#910)	3 weeks ago
tokenization	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
tool_use	0191c5efd1 tools: fix tool calls to more strictly follow OpenAI format (#1003)	2 weeks ago
tpu	ea59784f59 tpu: remove torch._dynamo.reset() (#952)	2 weeks ago
weight_loading	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
worker	a985143768 core: add cuda graph support for encoder-decoder models (#1051)	1 week ago
__init__.py	2755a48d51 merge dev branch into main (#153)	1 year ago
conftest.py	1721bea53a vlm: add support for Pixtral model (#1022)	1 week ago
test_cache_block_hashing.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_config.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_embedded_commit.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_inputs.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_logits_processor.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_regression.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_sampling_params.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_scalartype.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_sequence.py	0dfa6b60ec core: support logprobs with multi-step scheduling (#963)	2 weeks ago
test_sharded_state_loader.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
test_utils.py	c6c91edab7 ci: update & overhaul test units (#769)	1 month ago
utils.py	18acf7eaa0 tests: map physical device indices for test utils	1 week ago