.. |
backends
|
ca6b69966d
fix: explicitly end_forward() calls to flashinfer
|
пре 7 месеци |
ops
|
7d79c0e726
chore: use nvml query to avoid accidental cuda initialization
|
пре 7 месеци |
__init__.py
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
пре 7 месеци |
layer.py
|
7e66e8f899
fix: only add `Attention.kv_scale` if kv cache quant is enabled
|
пре 7 месеци |
selector.py
|
b6e60143e7
Flashinfer for prefill phase (#580)
|
пре 7 месеци |