david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 7ca63930c8d5084e1a7c4bdd5050faf75eed921f

AlpinDale a113309876 kernel: add meta functions for ops to prevent graph breaks (#1019)		2 nedēļas atpakaļ
..
activation.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
attention.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
cache.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
cpu_types.hpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
cpu_types_vsx.hpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
cpu_types_x86.hpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	2 nedēļas atpakaļ
dnnl_helper.hpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	2 nedēļas atpakaļ
layernorm.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
pos_encoding.cpp	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 mēneši atpakaļ
quant.cpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	2 nedēļas atpakaļ
torch_bindings.cpp	a113309876 kernel: add meta functions for ops to prevent graph breaks (#1019)	2 nedēļas atpakaļ
utils.cpp	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	2 nedēļas atpakaļ