.. |
__init__.py
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
11 月之前 |
cpu_executor.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
7 月之前 |
distributed_gpu_executor.py
|
7bcff4ac03
implement sharded state dict
|
7 月之前 |
executor_base.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
7 月之前 |
gpu_executor.py
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
7 月之前 |
multiproc_gpu_executor.py
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
7 月之前 |
multiproc_worker_utils.py
|
eaa06fdd14
fix some f-strings
|
7 月之前 |
neuron_executor.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
7 月之前 |
ray_gpu_executor.py
|
9f3d6205ce
fix ray gpu executor
|
7 月之前 |
ray_utils.py
|
c6a501f682
add multiprocessing executor; make ray optional
|
7 月之前 |