AlpinDale
|
b03b4d4c8c
fix: compute cutlass 3.x epilogues in fp32 instead of 16
|
7 月之前 |
AlpinDale
|
cdff8e89f9
feat: introduce `DraftModelRunner`
|
7 月之前 |
AlpinDale
|
9868bb2290
chore: make it clear that '%' should NOT be in tensor dict keys
|
7 月之前 |
AlpinDale
|
b8650ec51d
fix: better error message for MLPSpeculator
|
7 月之前 |
AlpinDale
|
0886c361f4
feat: OpenVINO CPU backend (#576)
|
7 月之前 |
AlpinDale
|
d63690a0df
chore: add fp8 examples
|
7 月之前 |
AlpinDale
|
b6ff0623a6
chore: clean up branding
|
7 月之前 |
AlpinDale
|
85ef2fe8b1
chore: clean up placeholder symbols
|
7 月之前 |
AlpinDale
|
1852d18326
chore: clean up inference examples
|
7 月之前 |
AlpinDale
|
e1c4cf1d50
chore: organize chat templates
|
7 月之前 |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
7 月之前 |
AlpinDale
|
426a13ab73
fix: pass multi_modal_kwargs to CPU model runner
|
7 月之前 |
AlpinDale
|
bb4da84623
fix: make sure multi modal kwargs can broadcast properly with ring buffer
|
7 月之前 |
AlpinDale
|
d2461161ec
chore: optimize KV cache swapping for TPU
|
7 月之前 |
AlpinDale
|
fad45609b8
chore: remove logical token blocks (turns out they are not needed)
|
7 月之前 |
AlpinDale
|
b3643a7bd7
fix: min_tokens for when there are multiple eos tokens
|
7 月之前 |
AlpinDale
|
51cfadeb29
fix: `MLPSpeculator` handling of `num_speculative_tokens`
|
7 月之前 |
AlpinDale
|
c5d8028668
fix: no need to redefine supports_vision and supports_lora in model class
|
7 月之前 |
AlpinDale
|
b81966c0da
fix: missed phi3v
|
7 月之前 |
AlpinDale
|
56e0b8223c
chore: add base class for LoRA-supported models
|
7 月之前 |
AlpinDale
|
bc5ac9584a
fix: make tensor_dict flattening/unflattening more generic
|
7 月之前 |
AlpinDale
|
dead030abf
fix: cuda graph with MLPSpeculator
|
7 月之前 |
AlpinDale
|
271a680026
feat: inference support for PowerPC ISA
|
7 月之前 |
AlpinDale
|
8b626e4032
fix: cpu kv cache allocation for TPU
|
7 月之前 |
AlpinDale
|
fcd58614f4
feat: support parallel sampling and swapping in TPU
|
7 月之前 |
AlpinDale
|
5b464d36ea
feat: bias epilogue support for cutlass kernels
|
7 月之前 |
AlpinDale
|
b16173b41e
chore: add minimum concurrency for XPU
|
7 月之前 |
AlpinDale
|
af1286f9fa
fix: kv cache size calculation on TPUs
|
7 月之前 |
AlpinDale
|
ecd4460d55
fix: support 2D inputs for embeddings
|
7 月之前 |
AlpinDale
|
66be475aae
fix: shm broadcast when the queue size is full
|
7 月之前 |