.. |
__init__.py
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 месяцев назад |
cpu_executor.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
6 месяцев назад |
distributed_gpu_executor.py
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
5 месяцев назад |
executor_base.py
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
5 месяцев назад |
gpu_executor.py
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
6 месяцев назад |
multiproc_gpu_executor.py
|
5b0c11d190
support pipeline parallel pynccl groups
|
5 месяцев назад |
multiproc_worker_utils.py
|
eaa06fdd14
fix some f-strings
|
6 месяцев назад |
neuron_executor.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
6 месяцев назад |
ray_gpu_executor.py
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
5 месяцев назад |
ray_utils.py
|
c6a501f682
add multiprocessing executor; make ray optional
|
6 месяцев назад |