.. |
__init__.py
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 ヶ月 前 |
batch_expansion.py
|
2c653a2268
fix: make speculative decoding work with per-request seed
|
5 ヶ月 前 |
draft_model_runner.py
|
a4cbcfe59f
feat: disable logprob serialization to CPU for spec decode
|
5 ヶ月 前 |
interfaces.py
|
3a53ff1e01
fix: raise an error for no draft token case when draft_tp>1
|
5 ヶ月 前 |
medusa_worker.py
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
5 ヶ月 前 |
metrics.py
|
2ebb37d1ee
update time since last collection for AsyncMetricsCollector
|
5 ヶ月 前 |
mlp_speculator_worker.py
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
5 ヶ月 前 |
multi_step_worker.py
|
dd18c5042c
move prepare_inputs to the GPU (#596)
|
5 ヶ月 前 |
ngram_worker.py
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
5 ヶ月 前 |
proposer_worker_base.py
|
d638dc592d
fix: some minor typing issues in spec decode
|
5 ヶ月 前 |
smaller_tp_proposer_worker.py
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
5 ヶ月 前 |
spec_decode_worker.py
|
a4cbcfe59f
feat: disable logprob serialization to CPU for spec decode
|
5 ヶ月 前 |
target_model_runner.py
|
a4cbcfe59f
feat: disable logprob serialization to CPU for spec decode
|
5 ヶ月 前 |
top1_proposer.py
|
3a53ff1e01
fix: raise an error for no draft token case when draft_tp>1
|
5 ヶ月 前 |
util.py
|
a4cbcfe59f
feat: disable logprob serialization to CPU for spec decode
|
5 ヶ月 前 |