AlpinDale
|
fa15bad2ea
chore: minor AMD fixes
|
5 months ago |
AlpinDale
|
22305c91e9
refactor _prepare_model_input_tensor and attn metadata builder for most backends
|
5 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
5 months ago |
AlpinDale
|
2105e4fd6b
feat: correctly invoke prefill & decode kernels for cross-attention
|
5 months ago |
AlpinDale
|
27a28fae05
chore: enable alibi for rocm flash attention
|
5 months ago |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
6 months ago |
AlpinDale
|
71a26f0998
chore: use pytorch sdpa backend to do naive attention for rocm
|
6 months ago |
AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
6 months ago |
AlpinDale
|
1b86cf6164
navi21 fallback to naive attention
|
6 months ago |
AlpinDale
|
0c15965621
fix fp8 kv
|
6 months ago |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
6 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
6 months ago |
AlpinDale
|
8b56dc4347
dict -> torch.Tensor for blocks_to_swap
|
6 months ago |
AlpinDale
|
3a0d1c7705
add get_name method to attention backends
|
6 months ago |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
6 months ago |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
6 months ago |
AlpinDale
|
aa15d3dc0f
sliding window in prefix prefill kernel
|
6 months ago |
AlpinDale
|
0d3562a7f9
MQA in triton FA
|
6 months ago |
AlpinDale
|
0f7ef9ef7c
fix: import in selector
|
6 months ago |
AlpinDale
|
46159b107a
formatting: pt1
|
7 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
7 months ago |
AlpinDale
|
66b7bc4415
sliding window in prefix kernel
|
8 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
9 months ago |