.. |
__init__.py
|
04b53d2db5
chore: add initializer files
|
1 rok pred |
cache_engine.py
|
5289c14b24
feat: Asymmetric Tensor Parallel (#594)
|
4 mesiacov pred |
cpu_model_runner.py
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
cpu_worker.py
|
42c66d5b00
feat: tensor parallelism for CPU backend
|
4 mesiacov pred |
embedding_model_runner.py
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
model_runner.py
|
4d4e767838
ci: take one of fixing lint issues
|
4 mesiacov pred |
model_runner_base.py
|
d8a51d05a7
fix: seeded gens with pipeline parallel
|
4 mesiacov pred |
neuron_model_runner.py
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
neuron_worker.py
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
4 mesiacov pred |
openvino_model_runner.py
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
openvino_worker.py
|
1ff6d4c3d7
feat: support pipeline parallel on indivisible GPU count (#587)
|
4 mesiacov pred |
tpu_model_runner.py
|
eef647deab
fix: greedy decoding in TPU
|
4 mesiacov pred |
tpu_worker.py
|
269e9aabda
fix: set readonly=True for non-root TPU devices
|
4 mesiacov pred |
worker.py
|
6979ff658e
chore: perform allreduce in fp32 for marlin, better logging
|
4 mesiacov pred |
worker_base.py
|
523ac99aca
chore: pipeline parallel with Ray accelerated dag
|
4 mesiacov pred |
xpu_model_runner.py
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
xpu_worker.py
|
99680b2d23
feat: soft prompts (#589)
|
4 mesiacov pred |