AlpinDale
|
a985143768
core: add cuda graph support for encoder-decoder models (#1051)
|
vor 1 Woche |
AlpinDale
|
271879a4a5
fix: disable chunked prefill and prefix caching for multimodal models (#1037)
|
vor 1 Woche |
AlpinDale
|
ddaefd8d38
chore: remove engine_use_ray (#1024)
|
vor 1 Woche |
AlpinDale
|
fe01e2ded8
chore: move `device` keys to a constant (#1020)
|
vor 2 Wochen |
AlpinDale
|
9a42869055
chore: keep chunked prefill enabled with prefix caching (#1007)
|
vor 2 Wochen |
AlpinDale
|
145e554a4d
neuron: add 8bit quantization for Neuron (#994)
|
vor 2 Wochen |
AlpinDale
|
510ae5b949
core: fix chunked prefill not being enabled by default for long contexts (#974)
|
vor 2 Wochen |
AlpinDale
|
b3f6eeb1d2
vlm: increase the default `max_num_batched_tokens` for multimodal models (#973)
|
vor 2 Wochen |
AlpinDale
|
8d9f1fd4e6
feat: add single user mode (#927)
|
vor 2 Wochen |
AlpinDale
|
f7f3fed265
feat: add async postprocessor (#925)
|
vor 2 Wochen |
AlpinDale
|
0c6d90dade
neuron: add support for tensor parallelism (#923)
|
vor 3 Wochen |
AlpinDale
|
22a4cd4595
core: fix spec decode metrics and envs circular import (#889)
|
vor 3 Wochen |
AlpinDale
|
901900854e
chore: consolidate environment variables within one file (#882)
|
vor 4 Wochen |
AlpinDale
|
48a8693aed
feat: multi-step scheduling (#831)
|
vor 1 Monat |
AlpinDale
|
2f61644f6e
SPMD optimizations (#824)
|
vor 1 Monat |
AlpinDale
|
f088ea81c7
fix: --max-seq-len-to-capture arg (#818)
|
vor 1 Monat |
AlpinDale
|
0256ed236b
feat: windows support (#790)
|
vor 2 Monaten |
AlpinDale
|
dcb794a340
fix: revert incorrect commit
|
vor 2 Monaten |
AlpinDale
|
76367b5ae7
wip
|
vor 2 Monaten |
AlpinDale
|
7222b84582
feat: ministral support (#776)
|
vor 2 Monaten |
AlpinDale
|
73177656ed
feat: quant_llm support (#755)
|
vor 3 Monaten |
AlpinDale
|
89a2c6dee1
chore: refactor `MultiModalConfig` initialization and profiling (#745)
|
vor 3 Monaten |
AlpinDale
|
28b6397188
chore: quant config for speculative draft models (#719)
|
vor 3 Monaten |
AlpinDale
|
008e646c7e
chore: add support for up to 2048 block size (#715)
|
vor 3 Monaten |
AlpinDale
|
577586309d
chore: multi-step args and sequence modifications (#713)
|
vor 3 Monaten |
AlpinDale
|
0b8b407b6d
feat: support profiling with multiple multi-modal inputs per prompt (#712)
|
vor 3 Monaten |
AlpinDale
|
d5033e12fd
feat: implement mistral tokenizer mode (#711)
|
vor 3 Monaten |
AlpinDale
|
4fe371b7fa
fix: allow passing float for GiB arguments (#690)
|
vor 4 Monaten |
AlpinDale
|
bf88c8567e
feat: mamba model support (#674)
|
vor 4 Monaten |
AlpinDale
|
a0e446a17d
feat: initial encoder-decoder support with BART model (#633)
|
vor 4 Monaten |