david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 22429990b20503e55eb8bd25e1fc3674288130e9

AlpinDale 7a313483f1 chore: move update_flash_attn_metadata to attn backend (#731)		3 mesi fa
..
__init__.py	9d81716bfd [v0.5.3] Release Candidate (#388)	8 mesi fa
abstract.py	7a313483f1 chore: move update_flash_attn_metadata to attn backend (#731)	3 mesi fa
blocksparse_attn.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mesi fa
flash_attn.py	7a313483f1 chore: move update_flash_attn_metadata to attn backend (#731)	3 mesi fa
flashinfer.py	60b702a827 chore: register custom torch ops for flash-attn and flashinfer (#724)	3 mesi fa
ipex_attn.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mesi fa
openvino.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mesi fa
pallas.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mesi fa
placeholder_attn.py	bf88c8567e feat: mamba model support (#674)	4 mesi fa
rocm_flash_attn.py	e200775863 feat: enable using fp8 kv and prefix caching with chunked prefill (#668)	4 mesi fa
torch_sdpa.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mesi fa
utils.py	3bbb3f2086 feat: add numpy implementation of `compute_slot_mapping` (#678)	4 mesi fa
xformers.py	e200775863 feat: enable using fp8 kv and prefix caching with chunked prefill (#668)	4 mesi fa