david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ ad1c6b86a1fa85e28c0a3e177a46f481d0cc4642

AlpinDale ad1c6b86a1 gptq_marlin: enable bfloat16		7 сар өмнө
..
all_reduce	9d81716bfd [v0.5.3] Release Candidate (#388)	10 сар өмнө
attention	251568470e initial nvidia fp8 e4m3 for kv cache	7 сар өмнө
backup	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	11 сар өмнө
cpu	b746fb5562 fix a few warnings on the cpu kernels	7 сар өмнө
hadamard	5d288aa76c feat: add fast hadamard transformation kernels (#232)	1 жил өмнө
moe	9d81716bfd [v0.5.3] Release Candidate (#388)	10 сар өмнө
punica	e3f2ea4850 make punica kernels work with rocm	7 сар өмнө
quantization	ad1c6b86a1 gptq_marlin: enable bfloat16	7 сар өмнө
activation_kernels.cu	3d6695cfbb feat: add approximate gelu activation kernels (#370)	11 сар өмнө
cache.h	251568470e initial nvidia fp8 e4m3 for kv cache	7 сар өмнө
cache_kernels.cu	251568470e initial nvidia fp8 e4m3 for kv cache	7 сар өмнө
cuda_compat.h	e3f2ea4850 make punica kernels work with rocm	7 сар өмнө
cuda_utils.h	31c95011a6 feat: FP8 E5M2 KV Cache (#226)	1 жил өмнө
cuda_utils_kernels.cu	31c95011a6 feat: FP8 E5M2 KV Cache (#226)	1 жил өмнө
dispatch_utils.h	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	11 сар өмнө
layernorm_kernels.cu	9d81716bfd [v0.5.3] Release Candidate (#388)	10 сар өмнө
ops.h	35ae01d7ba refactor: attention metadata term	7 сар өмнө
pos_encoding_kernels.cu	e702f587cf feat: add batched RoPE kernels (#371)	11 сар өмнө
pybind.cpp	2351a0e2cd feat: FlashInfer backend for decoding phase (#548)	7 сар өмнө
reduction.cuh	9d81716bfd [v0.5.3] Release Candidate (#388)	10 сар өмнө