david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ ad1c6b86a1fa85e28c0a3e177a46f481d0cc4642

AlpinDale ad1c6b86a1 gptq_marlin: enable bfloat16		há 7 meses atrás
..
all_reduce	9d81716bfd [v0.5.3] Release Candidate (#388)	há 10 meses atrás
attention	251568470e initial nvidia fp8 e4m3 for kv cache	há 7 meses atrás
backup	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	há 11 meses atrás
cpu	b746fb5562 fix a few warnings on the cpu kernels	há 7 meses atrás
hadamard	5d288aa76c feat: add fast hadamard transformation kernels (#232)	há 1 ano atrás
moe	9d81716bfd [v0.5.3] Release Candidate (#388)	há 10 meses atrás
punica	e3f2ea4850 make punica kernels work with rocm	há 7 meses atrás
quantization	ad1c6b86a1 gptq_marlin: enable bfloat16	há 7 meses atrás
activation_kernels.cu	3d6695cfbb feat: add approximate gelu activation kernels (#370)	há 11 meses atrás
cache.h	251568470e initial nvidia fp8 e4m3 for kv cache	há 7 meses atrás
cache_kernels.cu	251568470e initial nvidia fp8 e4m3 for kv cache	há 7 meses atrás
cuda_compat.h	e3f2ea4850 make punica kernels work with rocm	há 7 meses atrás
cuda_utils.h	31c95011a6 feat: FP8 E5M2 KV Cache (#226)	há 1 ano atrás
cuda_utils_kernels.cu	31c95011a6 feat: FP8 E5M2 KV Cache (#226)	há 1 ano atrás
dispatch_utils.h	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	há 11 meses atrás
layernorm_kernels.cu	9d81716bfd [v0.5.3] Release Candidate (#388)	há 10 meses atrás
ops.h	35ae01d7ba refactor: attention metadata term	há 7 meses atrás
pos_encoding_kernels.cu	e702f587cf feat: add batched RoPE kernels (#371)	há 11 meses atrás
pybind.cpp	2351a0e2cd feat: FlashInfer backend for decoding phase (#548)	há 7 meses atrás
reduction.cuh	9d81716bfd [v0.5.3] Release Candidate (#388)	há 10 meses atrás