AlpinDale
|
6124140a45
fix: remove error_on_invalid_device_count_status
|
преди 5 месеца |
AlpinDale
|
8c6ca4220b
fix: torch.set_num_threads() in multiproc_gpu_executor
|
преди 5 месеца |
AlpinDale
|
375b935d41
fix: pass signal from the main thread
|
преди 5 месеца |
AlpinDale
|
45a004874c
chore: allow specifying custom Executor
|
преди 5 месеца |
AlpinDale
|
b7a2d52e47
fix: allow using mp executor for pipeline parallel
|
преди 5 месеца |
AlpinDale
|
0c72961a12
chore: shutdown method for multiproc executor
|
преди 5 месеца |
AlpinDale
|
c8d398a4ae
feat: add custom triton cache manager
|
преди 5 месеца |
AlpinDale
|
0061aea5d5
fix: prevent contention amongst shards by setting OMP_NUM_THREADS=1
|
преди 5 месеца |
AlpinDale
|
5257ebce8c
fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED
|
преди 5 месеца |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
преди 5 месеца |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
преди 6 месеца |
AlpinDale
|
2c321ce1f2
chore: upgrade to rocm 6.1, update docker
|
преди 6 месеца |
AlpinDale
|
323fe23b21
chore: use 127.0.0.1 for single-node setups
|
преди 6 месеца |
AlpinDale
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
преди 6 месеца |
AlpinDale
|
427ab15434
fix: check_health when world_size==1
|
преди 6 месеца |
AlpinDale
|
05d6e43244
fix: `torch.compile()` with mp executor backend
|
преди 6 месеца |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
преди 6 месеца |
AlpinDale
|
de62ceb18c
refactor: eliminate parallel worker per-step task scheduling overhead
|
преди 6 месеца |
AlpinDale
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
преди 6 месеца |
AlpinDale
|
c6a501f682
add multiprocessing executor; make ray optional
|
преди 6 месеца |