.. |
__init__.py
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
11 meses atrás |
cpu_executor.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
7 meses atrás |
distributed_gpu_executor.py
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 meses atrás |
executor_base.py
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 meses atrás |
gpu_executor.py
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
7 meses atrás |
multiproc_gpu_executor.py
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
7 meses atrás |
multiproc_worker_utils.py
|
fa58ba87a3
fix: only set executor backend to mp if not multi-node
|
7 meses atrás |
neuron_executor.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
7 meses atrás |
ray_gpu_executor.py
|
dfa59bc5f9
fix: 16 GPUs in a cluster
|
7 meses atrás |
ray_utils.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 meses atrás |
ray_xpu_executor.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 meses atrás |
tpu_executor.py
|
fe21123a1c
feat: TPU support (#570)
|
7 meses atrás |
xpu_executor.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 meses atrás |