david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

AlpinDale 8976805f90 kernel: asymmetric AQ AZP quantization kernels (#1048)		1 week ago
..
activation.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
attention.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
cache.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
cpu_types.hpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
cpu_types_vsx.hpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
cpu_types_x86.hpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	1 week ago
dnnl_helper.hpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	1 week ago
layernorm.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
pos_encoding.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
quant.cpp	8976805f90 kernel: asymmetric AQ AZP quantization kernels (#1048)	1 week ago
torch_bindings.cpp	8976805f90 kernel: asymmetric AQ AZP quantization kernels (#1048)	1 week ago
utils.cpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	1 week ago