AlpinDale
|
815736fc54
feat: add cuda kernels for sampling
|
4 months ago |
AlpinDale
|
ba371fbbbd
feat: AWQ marlin kernels (#603)
|
4 months ago |
AlpinDale
|
c8f5424d72
add scale_ub inputs to fp8 dynamic per-token quant
|
4 months ago |
AlpinDale
|
196e6b64f1
feat: add fp8 dynamic per-token quant kernel
|
4 months ago |
AlpinDale
|
dd18c5042c
move prepare_inputs to the GPU (#596)
|
4 months ago |
AlpinDale
|
f5d52320da
Port mamba kernels to Aphrodite (#595)
|
4 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
4 months ago |
AlpinDale
|
ad24e74a99
feat: FP8 weight-only quantization support for Ampere GPUs
|
4 months ago |
AlpinDale
|
5be90c3859
Mamba infrastrucuture support (#586)
|
4 months ago |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
5 months ago |
AlpinDale
|
5b464d36ea
feat: bias epilogue support for cutlass kernels
|
5 months ago |
AlpinDale
|
cd9ed8623b
fix: cuda version check for fp8 support in the cutlass kernels
|
5 months ago |
AlpinDale
|
7e54c3916d
chore: factor out epilogues from cutlass kernels
|
5 months ago |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
5 months ago |