AlpinDale a4cbcfe59f feat: disable logprob serialization to CPU for spec decode il y a 5 mois
..
__init__.py 9d81716bfd [v0.5.3] Release Candidate (#388) il y a 8 mois
batch_expansion.py 2c653a2268 fix: make speculative decoding work with per-request seed il y a 5 mois
draft_model_runner.py a4cbcfe59f feat: disable logprob serialization to CPU for spec decode il y a 5 mois
interfaces.py 3a53ff1e01 fix: raise an error for no draft token case when draft_tp>1 il y a 5 mois
medusa_worker.py 16dff9babc chore: enable bonus token in spec decoding for KV cache based models il y a 5 mois
metrics.py 2ebb37d1ee update time since last collection for AsyncMetricsCollector il y a 5 mois
mlp_speculator_worker.py 16dff9babc chore: enable bonus token in spec decoding for KV cache based models il y a 5 mois
multi_step_worker.py dd18c5042c move prepare_inputs to the GPU (#596) il y a 5 mois
ngram_worker.py 16dff9babc chore: enable bonus token in spec decoding for KV cache based models il y a 5 mois
proposer_worker_base.py d638dc592d fix: some minor typing issues in spec decode il y a 5 mois
smaller_tp_proposer_worker.py 16dff9babc chore: enable bonus token in spec decoding for KV cache based models il y a 5 mois
spec_decode_worker.py a4cbcfe59f feat: disable logprob serialization to CPU for spec decode il y a 5 mois
target_model_runner.py a4cbcfe59f feat: disable logprob serialization to CPU for spec decode il y a 5 mois
top1_proposer.py 3a53ff1e01 fix: raise an error for no draft token case when draft_tp>1 il y a 5 mois
util.py a4cbcfe59f feat: disable logprob serialization to CPU for spec decode il y a 5 mois