AlpinDale
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
9 months ago |
AlpinDale
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
9 months ago |
AlpinDale
|
8d26cf3876
simplify model_executor logic
|
9 months ago |
AlpinDale
|
4d33ce60da
feat: Triton flash attention backend for ROCm (#407)
|
9 months ago |
AlpinDale
|
9aaeb5d349
add speculative config and arg for later
|
9 months ago |
AlpinDale
|
753f6dc51b
add v2 block manager
|
9 months ago |
AlpinDale
|
7b9c08afae
vision model support
|
9 months ago |
AlpinDale
|
d1786645a3
fix formatting
|
9 months ago |
AlpinDale
|
2319b411ce
refactor: neuron support
|
9 months ago |
AlpinDale
|
0f6d56b07f
feat: model executor refactor (#367)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |