AlpinDale
|
058e629f8e
chore: refactor marlin python utils
|
6 mesiacov pred |
AlpinDale
|
c0c2b1ac20
fix: get_and_reset only when scheduler outputs are not empty
|
6 mesiacov pred |
AlpinDale
|
b9268be8e8
fix: engine timeout due to request abort
|
6 mesiacov pred |
AlpinDale
|
8a44866e00
restrict outlines to < 0.1
|
6 mesiacov pred |
AlpinDale
|
4501ae5f15
fix: neuron executor for adapters
|
6 mesiacov pred |
AlpinDale
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
6 mesiacov pred |
AlpinDale
|
4150b1ea3a
fix: adapter methods for OpenVINO executor
|
6 mesiacov pred |
AlpinDale
|
db73f03cdc
fix: use ParallelLMHead for MLPSpeculator
|
6 mesiacov pred |
AlpinDale
|
9622c59f8f
chore: support 2D input shape in MoE layer
|
6 mesiacov pred |
AlpinDale
|
4628caeae6
fix: missed these adapter methods for TPU executor
|
6 mesiacov pred |
AlpinDale
|
dba22e4f83
fix: add zeromq fallback for broadcasting large objects (e.g. vlm images)
|
6 mesiacov pred |
AlpinDale
|
d9f4c36edd
feat: Medusa speculative decoding support (#590)
|
6 mesiacov pred |
AlpinDale
|
6abf4e3883
fix: needs_scalar_to_array logic check in linear layer
|
6 mesiacov pred |
AlpinDale
|
a3b56353fa
fix: another one missed
|
6 mesiacov pred |
AlpinDale
|
4e4cd55d30
fix: incorrect LoRA import
|
6 mesiacov pred |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
6 mesiacov pred |
AlpinDale
|
1cb06835a0
fix: TPU multimodal kwargs and outlines installation in TPU docker
|
6 mesiacov pred |
AlpinDale
|
1562e073c6
fix: ray worker rank assigment
|
6 mesiacov pred |
AlpinDale
|
1a40bf438b
fix: incorrect gpu capability when used mixed gpus
|
6 mesiacov pred |
AlpinDale
|
3798ecc309
chore: add flashinfer to default dockerfile
|
6 mesiacov pred |
AlpinDale
|
ebba0d9226
fix: mamba cache cuda graph padding
|
6 mesiacov pred |
AlpinDale
|
c25a9abb28
fix: outlines failing on second launch
|
6 mesiacov pred |
AlpinDale
|
2105e4fd6b
feat: correctly invoke prefill & decode kernels for cross-attention
|
6 mesiacov pred |
AlpinDale
|
3e7d5f7d14
chore: reloading fused_moe config on the last chunk
|
6 mesiacov pred |
AlpinDale
|
88a638d793
chore: debug logs for all available endpoints
|
6 mesiacov pred |
AlpinDale
|
98cb1c4cd1
feat: support fp8 via `llm-compressor`
|
6 mesiacov pred |
AlpinDale
|
bf4f113ef1
feat: add paligemma vision model support
|
6 mesiacov pred |
AlpinDale
|
7e99578712
fix: cleanup validation and update docs for vlm
|
6 mesiacov pred |
AlpinDale
|
526163003d
fix: improve consistency between feature size calc and dummy data for profiling
|
6 mesiacov pred |
AlpinDale
|
c11a8bdaad
fix: calculate max number of multi-modal tokens automatically
|
6 mesiacov pred |