AlpinDale
|
e13a66925c
feat: add fuyu vision model and persimmon language model support
|
6 달 전 |
AlpinDale
|
1efd0f89b7
feat: support FP8 for DeepSeekV2 MoE
|
6 달 전 |
AlpinDale
|
bf15e1b4e8
chore: deprecation warning for beam search
|
6 달 전 |
AlpinDale
|
7e9d4f3c71
chore: some more marlin cleanups
|
6 달 전 |
AlpinDale
|
34fc26c869
chore: bump lmfe version to 0.10.3
|
6 달 전 |
AlpinDale
|
23408b9b2b
chore: skip the driver worker
|
6 달 전 |
AlpinDale
|
d8f9f0ec16
fix: prefix prefill kernels for fp32 data type
|
6 달 전 |
AlpinDale
|
92bcbdf975
fix: megacore setting for TPU v5e-litepod
|
6 달 전 |
AlpinDale
|
0c17c2a8a7
chore: add commit hash, clean up engine logs
|
6 달 전 |
AlpinDale
|
cdc0e498a9
fix: illegal memory access in FP8 MoE kernel
|
6 달 전 |
AlpinDale
|
b1e61268a8
bump torch to 2.3.1
|
6 달 전 |
AlpinDale
|
05e45aeb53
fix: dtype mismatch for paligemma
|
6 달 전 |
AlpinDale
|
500f3b654f
fix: support bias term in compressed-tensors quant
|
6 달 전 |
AlpinDale
|
d2f38f6f81
chore: remove separate bias add
|
6 달 전 |
AlpinDale
|
ddb28a80a3
fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm
|
6 달 전 |
AlpinDale
|
a2d476183f
fix: remove scipy and re-implement CSR matrix
|
6 달 전 |
AlpinDale
|
5ac65d2d49
chore: bump optimum-intel
|
6 달 전 |
AlpinDale
|
cc6399792f
fix: keep consistent with how pytorch finds libcudart.so
|
6 달 전 |
AlpinDale
|
63becc67c0
fix: prompt logprob detokenization
|
6 달 전 |
AlpinDale
|
0ab35652d3
fix: llava 1.6 feature size calculation
|
6 달 전 |
AlpinDale
|
058e629f8e
chore: refactor marlin python utils
|
6 달 전 |
AlpinDale
|
c0c2b1ac20
fix: get_and_reset only when scheduler outputs are not empty
|
6 달 전 |
AlpinDale
|
b9268be8e8
fix: engine timeout due to request abort
|
6 달 전 |
AlpinDale
|
8a44866e00
restrict outlines to < 0.1
|
6 달 전 |
AlpinDale
|
4501ae5f15
fix: neuron executor for adapters
|
6 달 전 |
AlpinDale
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
6 달 전 |
AlpinDale
|
4150b1ea3a
fix: adapter methods for OpenVINO executor
|
6 달 전 |
AlpinDale
|
db73f03cdc
fix: use ParallelLMHead for MLPSpeculator
|
6 달 전 |
AlpinDale
|
9622c59f8f
chore: support 2D input shape in MoE layer
|
6 달 전 |
AlpinDale
|
4628caeae6
fix: missed these adapter methods for TPU executor
|
6 달 전 |