AlpinDale
|
8adc496a2a
fix: use paged attention for bloc swapping/copying in flashinfer
|
5 ヶ月 前 |
AlpinDale
|
22305c91e9
refactor _prepare_model_input_tensor and attn metadata builder for most backends
|
5 ヶ月 前 |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
5 ヶ月 前 |
AlpinDale
|
2105e4fd6b
feat: correctly invoke prefill & decode kernels for cross-attention
|
5 ヶ月 前 |
AlpinDale
|
151d782233
fix: attention softcapping for flashinfer
|
5 ヶ月 前 |
AlpinDale
|
ca6b69966d
fix: explicitly end_forward() calls to flashinfer
|
6 ヶ月 前 |
AlpinDale
|
b6e60143e7
Flashinfer for prefill phase (#580)
|
6 ヶ月 前 |
AlpinDale
|
b6ff0623a6
chore: clean up branding
|
6 ヶ月 前 |
AlpinDale
|
405bb74612
Control plane comms refactor (#573)
|
6 ヶ月 前 |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
6 ヶ月 前 |
AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
6 ヶ月 前 |
AlpinDale
|
0c15965621
fix fp8 kv
|
6 ヶ月 前 |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
6 ヶ月 前 |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
6 ヶ月 前 |
AlpinDale
|
d11d68f4e6
switch to vllm-flash-attn
|
6 ヶ月 前 |
AlpinDale
|
8b56dc4347
dict -> torch.Tensor for blocks_to_swap
|
6 ヶ月 前 |
AlpinDale
|
3a0d1c7705
add get_name method to attention backends
|
6 ヶ月 前 |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
6 ヶ月 前 |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
6 ヶ月 前 |