david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 93cffaf446430c2267e15593a856947e6e8c6d26

AlpinDale 2313c97e3d add cutlass w8a8 kernels (#556)		пре 7 месеци
..
all_reduce	9d81716bfd [v0.5.3] Release Candidate (#388)	пре 10 месеци
attention	251568470e initial nvidia fp8 e4m3 for kv cache	пре 7 месеци
backup	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	пре 11 месеци
cpu	b746fb5562 fix a few warnings on the cpu kernels	пре 7 месеци
hadamard	5d288aa76c feat: add fast hadamard transformation kernels (#232)	пре 1 година
moe	9d81716bfd [v0.5.3] Release Candidate (#388)	пре 10 месеци
punica	d4edba99f9 add lora dims for Qwen1.5-32B	пре 7 месеци
quantization	2313c97e3d add cutlass w8a8 kernels (#556)	пре 7 месеци
activation_kernels.cu	3d6695cfbb feat: add approximate gelu activation kernels (#370)	пре 11 месеци
cache.h	251568470e initial nvidia fp8 e4m3 for kv cache	пре 7 месеци
cache_kernels.cu	251568470e initial nvidia fp8 e4m3 for kv cache	пре 7 месеци
cuda_compat.h	e3f2ea4850 make punica kernels work with rocm	пре 7 месеци
cuda_utils.h	31c95011a6 feat: FP8 E5M2 KV Cache (#226)	пре 1 година
cuda_utils_kernels.cu	31c95011a6 feat: FP8 E5M2 KV Cache (#226)	пре 1 година
dispatch_utils.h	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	пре 11 месеци
layernorm_kernels.cu	9d81716bfd [v0.5.3] Release Candidate (#388)	пре 10 месеци
ops.h	35ae01d7ba refactor: attention metadata term	пре 7 месеци
pos_encoding_kernels.cu	e702f587cf feat: add batched RoPE kernels (#371)	пре 11 месеци
pybind.cpp	2351a0e2cd feat: FlashInfer backend for decoding phase (#548)	пре 7 месеци
reduction.cuh	9d81716bfd [v0.5.3] Release Candidate (#388)	пре 10 месеци