AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
7 months ago |
AlpinDale
|
19a959a03e
prioritize user selection for attention
|
7 months ago |
AlpinDale
|
b8b63eb5ca
fix head_size check for flash attention backend
|
7 months ago |
AlpinDale
|
93cffaf446
add flash_attn back
|
7 months ago |
AlpinDale
|
01190e5049
use flash attention for the decoding phase
|
7 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
7 months ago |
AlpinDale
|
d11d68f4e6
switch to vllm-flash-attn
|
7 months ago |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
7 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 months ago |