AlpinDale
|
60ca1e1e5e
feat: add ngram prompt lookup decoding for speculative decoding (#438)
|
9 months ago |
AlpinDale
|
d8c4193704
feat: Speculative Decoding using a draft model (#432)
|
9 months ago |
AlpinDale
|
140ebac03e
fix the nsight profiling with ray
|
9 months ago |
AlpinDale
|
8d26cf3876
simplify model_executor logic
|
9 months ago |
AlpinDale
|
4d33ce60da
feat: Triton flash attention backend for ROCm (#407)
|
9 months ago |
AlpinDale
|
893c791152
fix TP for llava
|
9 months ago |
AlpinDale
|
9aaeb5d349
add speculative config and arg for later
|
9 months ago |
AlpinDale
|
10e708726e
enable multi-node inference
|
10 months ago |
AlpinDale
|
753f6dc51b
add v2 block manager
|
10 months ago |
AlpinDale
|
41f5af0426
add python nccl wrapper, remove cupy
|
10 months ago |
AlpinDale
|
7b9c08afae
vision model support
|
10 months ago |
AlpinDale
|
2319b411ce
refactor: neuron support
|
10 months ago |
AlpinDale
|
0f6d56b07f
feat: model executor refactor (#367)
|
10 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
10 months ago |