david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ control-vectors

AlpinDale a4cbcfe59f feat: disable logprob serialization to CPU for spec decode		5 ヶ月前
..
__init__.py	9d81716bfd [v0.5.3] Release Candidate (#388)	8 ヶ月前
batch_expansion.py	2c653a2268 fix: make speculative decoding work with per-request seed	5 ヶ月前
draft_model_runner.py	a4cbcfe59f feat: disable logprob serialization to CPU for spec decode	5 ヶ月前
interfaces.py	3a53ff1e01 fix: raise an error for no draft token case when draft_tp>1	5 ヶ月前
medusa_worker.py	16dff9babc chore: enable bonus token in spec decoding for KV cache based models	5 ヶ月前
metrics.py	2ebb37d1ee update time since last collection for AsyncMetricsCollector	5 ヶ月前
mlp_speculator_worker.py	16dff9babc chore: enable bonus token in spec decoding for KV cache based models	5 ヶ月前
multi_step_worker.py	dd18c5042c move prepare_inputs to the GPU (#596)	5 ヶ月前
ngram_worker.py	16dff9babc chore: enable bonus token in spec decoding for KV cache based models	5 ヶ月前
proposer_worker_base.py	d638dc592d fix: some minor typing issues in spec decode	5 ヶ月前
smaller_tp_proposer_worker.py	16dff9babc chore: enable bonus token in spec decoding for KV cache based models	5 ヶ月前
spec_decode_worker.py	a4cbcfe59f feat: disable logprob serialization to CPU for spec decode	5 ヶ月前
target_model_runner.py	a4cbcfe59f feat: disable logprob serialization to CPU for spec decode	5 ヶ月前
top1_proposer.py	3a53ff1e01 fix: raise an error for no draft token case when draft_tp>1	5 ヶ月前
util.py	a4cbcfe59f feat: disable logprob serialization to CPU for spec decode	5 ヶ月前