AlpinDale 9e73559eba make use of batched rotary embedding kernels to support long context lora 7 mesi fa
..
__init__.py 04b53d2db5 chore: add initializer files 1 anno fa
cache_engine.py 50b7c13db0 refactor: attention selector (#552) 8 mesi fa
cpu_model_runner.py a94de94c44 refactor: combine the prefill and decode into a single API (#553) 7 mesi fa
cpu_worker.py 50b7c13db0 refactor: attention selector (#552) 8 mesi fa
embedding_model_runner.py a94de94c44 refactor: combine the prefill and decode into a single API (#553) 7 mesi fa
model_runner.py 9e73559eba make use of batched rotary embedding kernels to support long context lora 7 mesi fa
neuron_model_runner.py 35ae01d7ba refactor: attention metadata term 8 mesi fa
neuron_worker.py fca911ee0a vLLM Upstream Sync (#526) 8 mesi fa
worker.py 236be273e5 feat: tensor parallel speculative decoding (#554) 7 mesi fa
worker_base.py ef733aee43 implement ExecuteModelData to reduce executor complexity 8 mesi fa