.. |
__init__.py
|
04b53d2db5
chore: add initializer files
|
1 anno fa |
cache_engine.py
|
50b7c13db0
refactor: attention selector (#552)
|
8 mesi fa |
cpu_model_runner.py
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
7 mesi fa |
cpu_worker.py
|
50b7c13db0
refactor: attention selector (#552)
|
8 mesi fa |
embedding_model_runner.py
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
7 mesi fa |
model_runner.py
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
7 mesi fa |
neuron_model_runner.py
|
35ae01d7ba
refactor: attention metadata term
|
8 mesi fa |
neuron_worker.py
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 mesi fa |
worker.py
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
7 mesi fa |
worker_base.py
|
ef733aee43
implement ExecuteModelData to reduce executor complexity
|
8 mesi fa |