AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
10 months ago |
AlpinDale
|
23389d0108
zero out a variable instead of vector in kernels
|
1 year ago |
AlpinDale
|
081545bde6
fix: various CUDA kernel tweaks
|
1 year ago |
AlpinDale
|
05d0a7e763
feat: adapt the attention kernels
|
1 year ago |
AlpinDale
|
3c3944153c
feat: add generic attention and FP32 dtype kernels
|
1 year ago |