david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ efc6f7fbec095560d4436b938d3f5c55ccaa6921

AlpinDale 3d72f05c7b feat: flattened 1D tensor -> 2D tensor (#85)		há 1 ano atrás
..
attention	4e71bd1d12 feat: add PagedAttention V2 kernels (#76)	há 1 ano atrás
quantization	ce5e2332ea fix: launch AWQ kernels on the current CUDAStream (#75)	há 1 ano atrás
activation.cpp	32844c1522 add GELU kernels and remove compile bloat	há 1 ano atrás
activation_kernels.cu	3d72f05c7b feat: flattened 1D tensor -> 2D tensor (#85)	há 1 ano atrás
attention.cpp	4e71bd1d12 feat: add PagedAttention V2 kernels (#76)	há 1 ano atrás
cache.cpp	081545bde6 fix: various CUDA kernel tweaks	há 1 ano atrás
cache_kernels.cu	3d72f05c7b feat: flattened 1D tensor -> 2D tensor (#85)	há 1 ano atrás
cuda_utils.cpp	75c27d3e65 massive overhaul	há 1 ano atrás
cuda_utils_kernels.cu	75c27d3e65 massive overhaul	há 1 ano atrás
dispatch_utils.h	32844c1522 add GELU kernels and remove compile bloat	há 1 ano atrás
layernorm.cpp	081545bde6 fix: various CUDA kernel tweaks	há 1 ano atrás
layernorm_kernels.cu	3d72f05c7b feat: flattened 1D tensor -> 2D tensor (#85)	há 1 ano atrás
pos_encoding.cpp	45f6d9f923 initial refactor commit	há 1 ano atrás
pos_encoding_kernels.cu	3d72f05c7b feat: flattened 1D tensor -> 2D tensor (#85)	há 1 ano atrás
quantization.cpp	0495c50a3e GPTQ+exllama support (#21)	há 1 ano atrás
reduction.cuh	081545bde6 fix: various CUDA kernel tweaks	há 1 ano atrás