AlpinDale
|
05e45aeb53
fix: dtype mismatch for paligemma
|
6 달 전 |
AlpinDale
|
500f3b654f
fix: support bias term in compressed-tensors quant
|
6 달 전 |
AlpinDale
|
d2f38f6f81
chore: remove separate bias add
|
6 달 전 |
AlpinDale
|
ddb28a80a3
fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm
|
6 달 전 |
AlpinDale
|
a2d476183f
fix: remove scipy and re-implement CSR matrix
|
6 달 전 |
AlpinDale
|
5ac65d2d49
chore: bump optimum-intel
|
6 달 전 |
AlpinDale
|
cc6399792f
fix: keep consistent with how pytorch finds libcudart.so
|
6 달 전 |
AlpinDale
|
63becc67c0
fix: prompt logprob detokenization
|
6 달 전 |
AlpinDale
|
0ab35652d3
fix: llava 1.6 feature size calculation
|
6 달 전 |
AlpinDale
|
058e629f8e
chore: refactor marlin python utils
|
6 달 전 |
AlpinDale
|
c0c2b1ac20
fix: get_and_reset only when scheduler outputs are not empty
|
6 달 전 |
AlpinDale
|
b9268be8e8
fix: engine timeout due to request abort
|
6 달 전 |
AlpinDale
|
8a44866e00
restrict outlines to < 0.1
|
6 달 전 |
AlpinDale
|
4501ae5f15
fix: neuron executor for adapters
|
6 달 전 |
AlpinDale
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
6 달 전 |
AlpinDale
|
4150b1ea3a
fix: adapter methods for OpenVINO executor
|
6 달 전 |
AlpinDale
|
db73f03cdc
fix: use ParallelLMHead for MLPSpeculator
|
6 달 전 |
AlpinDale
|
9622c59f8f
chore: support 2D input shape in MoE layer
|
6 달 전 |
AlpinDale
|
4628caeae6
fix: missed these adapter methods for TPU executor
|
6 달 전 |
AlpinDale
|
dba22e4f83
fix: add zeromq fallback for broadcasting large objects (e.g. vlm images)
|
6 달 전 |
AlpinDale
|
d9f4c36edd
feat: Medusa speculative decoding support (#590)
|
6 달 전 |
AlpinDale
|
6abf4e3883
fix: needs_scalar_to_array logic check in linear layer
|
6 달 전 |
AlpinDale
|
a3b56353fa
fix: another one missed
|
6 달 전 |
AlpinDale
|
4e4cd55d30
fix: incorrect LoRA import
|
6 달 전 |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
6 달 전 |
AlpinDale
|
1cb06835a0
fix: TPU multimodal kwargs and outlines installation in TPU docker
|
6 달 전 |
AlpinDale
|
1562e073c6
fix: ray worker rank assigment
|
6 달 전 |
AlpinDale
|
1a40bf438b
fix: incorrect gpu capability when used mixed gpus
|
6 달 전 |
AlpinDale
|
3798ecc309
chore: add flashinfer to default dockerfile
|
6 달 전 |
AlpinDale
|
ebba0d9226
fix: mamba cache cuda graph padding
|
6 달 전 |