.. |
__init__.py
|
04b53d2db5
chore: add initializer files
|
1 vuosi sitten |
cache_engine.py
|
f40b809d3b
allow using v2 block manager with sliding window
|
7 kuukautta sitten |
cpu_model_runner.py
|
8d77c69cbd
feat: support image processor and add llava example
|
7 kuukautta sitten |
cpu_worker.py
|
50b7c13db0
refactor: attention selector (#552)
|
7 kuukautta sitten |
embedding_model_runner.py
|
8d77c69cbd
feat: support image processor and add llava example
|
7 kuukautta sitten |
model_runner.py
|
6cecbbff6a
fix: reduce memory footprint of cuda graph by adding output buffer
|
7 kuukautta sitten |
neuron_model_runner.py
|
35ae01d7ba
refactor: attention metadata term
|
8 kuukautta sitten |
neuron_worker.py
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 kuukautta sitten |
worker.py
|
eb2c5c77df
feat: enforce the max possible seqlen
|
7 kuukautta sitten |
worker_base.py
|
7194047318
remove vllm-nccl
|
7 kuukautta sitten |