david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 4a7cb8f232046963f8ffacf891baacaaa26175cc

AlpinDale 4a7cb8f232 rocm: add custom paged attention kernels for ROCm (#1043)		2 月之前
..
backends	4a7cb8f232 rocm: add custom paged attention kernels for ROCm (#1043)	2 月之前
ops	e200775863 feat: enable using fp8 kv and prefix caching with chunked prefill (#668)	6 月之前
__init__.py	1405051912 attention: add `AttentionState` abstraction (#863)	3 月之前
layer.py	bf88c8567e feat: mamba model support (#674)	6 月之前
selector.py	4ddc14d653 core: use flashinfer for FP8 KV when available (#944)	2 月之前