david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 0715cc1958ed6a9aeba83a5ad438b931bc8c8db5

AlpinDale 5e82533d02 upstream: add option to specify tokenizer		1 year ago
..
attention	5e82533d02 upstream: add option to specify tokenizer	1 year ago
activation.cpp	28866137ea feat: add swiglu activation	1 year ago
activation_kernels.cu	28866137ea feat: add swiglu activation	1 year ago
attention.cpp	d40a8d6bb0 chore: bind single_query_cached_kv_attention to python	1 year ago
cache.cpp	a409431c40 feat: draft for cuda kernels	1 year ago
cache_kernels.cu	a409431c40 feat: draft for cuda kernels	1 year ago
layernorm.cpp	0ec53128b6 feat: add layernorm kernels	1 year ago
layernorm_kernels.cu	0ec53128b6 feat: add layernorm kernels	1 year ago
pos_encoding.cpp	67a17a1e93 feat: add rotary embeddings	1 year ago
pos_encoding_kernels.cu	67a17a1e93 feat: add rotary embeddings	1 year ago
reduction.cuh	0ec53128b6 feat: add layernorm kernels	1 year ago