.. |
__init__.py
|
04b53d2db5
chore: add initializer files
|
1 年間 前 |
cache_engine.py
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 ヶ月 前 |
cpu_model_runner.py
|
6e0761ba5d
make init_distributed_environment compatible with init_process_group
|
9 ヶ月 前 |
cpu_worker.py
|
083ba7b452
roll back chunked prefill changes to SDPA, isolate cpu worker
|
9 ヶ月 前 |
model_runner.py
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 ヶ月 前 |
neuron_model_runner.py
|
0f1399c135
feat: attention refactor part 2
|
9 ヶ月 前 |
neuron_worker.py
|
4d33ce60da
feat: Triton flash attention backend for ROCm (#407)
|
9 ヶ月 前 |
worker.py
|
a1f18f17e6
modify the cache engine and model runner/worker to support mamba states
|
8 ヶ月 前 |
worker_base.py
|
8c67b37131
fix docstrings
|
9 ヶ月 前 |