david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 4a7cb8f232046963f8ffacf891baacaaa26175cc

AlpinDale 4a7cb8f232 rocm: add custom paged attention kernels for ROCm (#1043)		2 mesi fa
..
__init__.py	9d81716bfd [v0.5.3] Release Candidate (#388)	10 mesi fa
abstract.py	1390915778 multi-step: add support for flashinfer attention backend (#1033)	2 mesi fa
blocksparse_attn.py	1405051912 attention: add `AttentionState` abstraction (#863)	3 mesi fa
flash_attn.py	1390915778 multi-step: add support for flashinfer attention backend (#1033)	2 mesi fa
flashinfer.py	c951a54d21 fix: multi-step + flashinfer with cuda graphs (#1036)	2 mesi fa
ipex_attn.py	6951928522 xpu: bump IPEX to 2.3, support GQA (#1042)	2 mesi fa
openvino.py	1405051912 attention: add `AttentionState` abstraction (#863)	3 mesi fa
pallas.py	032974a28a tpu: fix TPU type api (#975)	2 mesi fa
placeholder_attn.py	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 mesi fa
rocm_flash_attn.py	4a7cb8f232 rocm: add custom paged attention kernels for ROCm (#1043)	2 mesi fa
torch_sdpa.py	1405051912 attention: add `AttentionState` abstraction (#863)	3 mesi fa
utils.py	3bb0f07461 chore: rename `task_handler` to `worker` (#985)	2 mesi fa
xformers.py	1405051912 attention: add `AttentionState` abstraction (#863)	3 mesi fa