AlpinDale
|
a985143768
core: add cuda graph support for encoder-decoder models (#1051)
|
4 周之前 |
AlpinDale
|
c951a54d21
fix: multi-step + flashinfer with cuda graphs (#1036)
|
4 周之前 |
AlpinDale
|
1390915778
multi-step: add support for flashinfer attention backend (#1033)
|
4 周之前 |
AlpinDale
|
3bb0f07461
chore: rename `task_handler` to `worker` (#985)
|
1 月之前 |
AlpinDale
|
d6cbbba95f
Revert "fix: issues with flashinfer fp8 kv (#950)" (#956)
|
1 月之前 |
AlpinDale
|
cef6da8863
fix: issues with flashinfer fp8 kv (#950)
|
1 月之前 |
AlpinDale
|
4ddc14d653
core: use flashinfer for FP8 KV when available (#944)
|
1 月之前 |
AlpinDale
|
1405051912
attention: add `AttentionState` abstraction (#863)
|
1 月之前 |
AlpinDale
|
60b702a827
chore: register custom torch ops for flash-attn and flashinfer (#724)
|
4 月之前 |
AlpinDale
|
300f889554
chore: update flashinfer to v0.1.3 (#685)
|
4 月之前 |
AlpinDale
|
67ee885293
fix: flashinfer outputs (#657)
|
4 月之前 |
AlpinDale
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 月之前 |