AlpinDale
|
61aed092a5
rocm: add support for FP8 KV cache in the custom paged attention kkernels (#1066)
|
5 days ago |
AlpinDale
|
9bdf8d5bfa
mamba: enable continuous batching for mamba kernels (#1055)
|
1 week ago |
AlpinDale
|
239a8cae25
torch.compile: register all-reduce operations as custom ops (#1050)
|
1 week ago |
AlpinDale
|
8976805f90
kernel: asymmetric AQ AZP quantization kernels (#1048)
|
1 week ago |
AlpinDale
|
4a7cb8f232
rocm: add custom paged attention kernels for ROCm (#1043)
|
1 week ago |
AlpinDale
|
1390915778
multi-step: add support for flashinfer attention backend (#1033)
|
1 week ago |
AlpinDale
|
a113309876
kernel: add meta functions for ops to prevent graph breaks (#1019)
|
1 week ago |
AlpinDale
|
fcfcfc65e1
quants: add triton kernels for AWQ (#946)
|
2 weeks ago |
AlpinDale
|
9f3e7c86e2
feat: add fused Marlin MoE kernel (#934)
|
2 weeks ago |
AlpinDale
|
93bc863591
feat: Machete Kernels for Hopper GPUs (#842)
|
1 month ago |
AlpinDale
|
bfc8988116
feat: add cuda sampling kernels for top_k and top_p (#828)
|
1 month ago |
AlpinDale
|
f98e7b2f8c
feat: add HQQ quantization support (#795)
|
2 months ago |
AlpinDale
|
73177656ed
feat: quant_llm support (#755)
|
3 months ago |
AlpinDale
|
ccbda97416
fix: types in AQLM and GGUF for dynamo support (#736)
|
3 months ago |
AlpinDale
|
b0f262eec1
feat: FP8 quantization support for AMD ROCm (#729)
|
3 months ago |
AlpinDale
|
5d37ec1016
suppress tpu import warning (#696)
|
3 months ago |
AlpinDale
|
a401f8e05d
feat: per-tensor token epilogue kernels (#630)
|
4 months ago |
AlpinDale
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |