AlpinDale 4a7cb8f232 rocm: add custom paged attention kernels for ROCm (#1043) 2 ヶ月 前
..
backends 4a7cb8f232 rocm: add custom paged attention kernels for ROCm (#1043) 2 ヶ月 前
ops e200775863 feat: enable using fp8 kv and prefix caching with chunked prefill (#668) 6 ヶ月 前
__init__.py 1405051912 attention: add `AttentionState` abstraction (#863) 3 ヶ月 前
layer.py bf88c8567e feat: mamba model support (#674) 6 ヶ月 前
selector.py 4ddc14d653 core: use flashinfer for FP8 KV when available (#944) 2 ヶ月 前