david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ sampling-kernels

AlpinDale d9f4c36edd feat: Medusa speculative decoding support (#590)		5 months ago
..
__init__.py	d9f4c36edd feat: Medusa speculative decoding support (#590)	5 months ago
arctic.py	1e35cef979 feat: add arctic snowflake model (#551)	6 months ago
chatglm.py	9e73559eba make use of batched rotary embedding kernels to support long context lora	6 months ago
dbrx.py	fca911ee0a vLLM Upstream Sync (#526)	6 months ago
falcon.py	fca911ee0a vLLM Upstream Sync (#526)	6 months ago
jais.py	fca911ee0a vLLM Upstream Sync (#526)	6 months ago
medusa.py	d9f4c36edd feat: Medusa speculative decoding support (#590)	5 months ago
mlp_speculator.py	de7e6919c0 feat: support tied weights and input scale for MLPSpeculator	5 months ago
mpt.py	fca911ee0a vLLM Upstream Sync (#526)	6 months ago