AlpinDale
|
53d391e1f2
merge 'dev' into 'main'
|
1 year ago |
AlpinDale
|
15a0454172
feat: FP8 KV Cache (#185)
|
1 year ago |
AlpinDale
|
b9b295d74e
chore: backlogs 1 (#191)
|
1 year ago |
AlpinDale
|
f013d714c0
chore: merge dev branch into main (#177)
|
1 year ago |
AlpinDale
|
7d91e9e0f2
feat: CUDA graphs (#172)
|
1 year ago |
AlpinDale
|
02f3ab3501
fix: replace head_mapping with num_kv_heads (#161)
|
1 year ago |
AlpinDale
|
2755a48d51
merge dev branch into main (#153)
|
1 year ago |
AlpinDale
|
1334a833a4
feat: AMD ROCm support (#95)
|
1 year ago |
AlpinDale
|
4e71bd1d12
feat: add PagedAttention V2 kernels (#76)
|
1 year ago |
AlpinDale
|
b7918ad45f
fix: attention kernel attribute (#52)
|
1 year ago |
AlpinDale
|
0495c50a3e
GPTQ+exllama support (#21)
|
1 year ago |
AlpinDale
|
75c27d3e65
massive overhaul
|
1 year ago |
AlpinDale
|
45f6d9f923
initial refactor commit
|
1 year ago |
AlpinDale
|
23389d0108
zero out a variable instead of vector in kernels
|
1 year ago |
AlpinDale
|
ed540c3c87
fix: typo in attention kernel
|
1 year ago |
AlpinDale
|
fffb9f2dac
chore: attention kernel optimizations
|
1 year ago |
AlpinDale
|
24c78e7306
optimization: multi-query attention kernel
|
1 year ago |
AlpinDale
|
081545bde6
fix: various CUDA kernel tweaks
|
1 year ago |
AlpinDale
|
05d0a7e763
feat: adapt the attention kernels
|
1 year ago |