david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ sampling-kernels

AlpinDale 815736fc54 feat: add cuda kernels for sampling		4 달 전
..
all_reduce	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
attention	9d7beaa5b9 chore: separate kv_scale into k_scale and v_scale	4 달 전
backup	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	9 달 전
cpu	9d7beaa5b9 chore: separate kv_scale into k_scale and v_scale	4 달 전
hadamard	5d288aa76c feat: add fast hadamard transformation kernels (#232)	11 달 전
mamba	f5d52320da Port mamba kernels to Aphrodite (#595)	4 달 전
moe	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
prepare_inputs	dd18c5042c move prepare_inputs to the GPU (#596)	4 달 전
punica	4f42985b5c feat: qwen2 lora shapes	5 달 전
quantization	ba371fbbbd feat: AWQ marlin kernels (#603)	4 달 전
sampling	815736fc54 feat: add cuda kernels for sampling	4 달 전
activation_kernels.cu	c0c336aaa3 refactor: registry for processing model inputs; quick_gelu; clip model support	5 달 전
cache.h	9d7beaa5b9 chore: separate kv_scale into k_scale and v_scale	4 달 전
cache_kernels.cu	9d7beaa5b9 chore: separate kv_scale into k_scale and v_scale	4 달 전
cuda_compat.h	00acf371f9 rocm: fused topk softmax	5 달 전
cuda_utils.h	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
cuda_utils_kernels.cu	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
dispatch_utils.h	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
layernorm_kernels.cu	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
ops.h	815736fc54 feat: add cuda kernels for sampling	4 달 전
pos_encoding_kernels.cu	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
reduction.cuh	aba03b4756 feat: dynamic per-token activation quantization	5 달 전
registration.h	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	5 달 전
torch_bindings.cpp	815736fc54 feat: add cuda kernels for sampling	4 달 전