.. |
__init__.py
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 달 전 |
cpu_executor.py
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
8 달 전 |
executor_base.py
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
8 달 전 |
gpu_executor.py
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
8 달 전 |
neuron_executor.py
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
8 달 전 |
ray_gpu_executor.py
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
8 달 전 |