AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
7 月之前 |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
7 月之前 |
AlpinDale
|
2c321ce1f2
chore: upgrade to rocm 6.1, update docker
|
7 月之前 |
AlpinDale
|
323fe23b21
chore: use 127.0.0.1 for single-node setups
|
7 月之前 |
AlpinDale
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
7 月之前 |
AlpinDale
|
427ab15434
fix: check_health when world_size==1
|
7 月之前 |
AlpinDale
|
05d6e43244
fix: `torch.compile()` with mp executor backend
|
7 月之前 |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 月之前 |
AlpinDale
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
7 月之前 |
AlpinDale
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
7 月之前 |
AlpinDale
|
c6a501f682
add multiprocessing executor; make ray optional
|
7 月之前 |