AlpinDale
|
dba22e4f83
fix: add zeromq fallback for broadcasting large objects (e.g. vlm images)
|
преди 6 месеца |
AlpinDale
|
d9f4c36edd
feat: Medusa speculative decoding support (#590)
|
преди 6 месеца |
AlpinDale
|
6abf4e3883
fix: needs_scalar_to_array logic check in linear layer
|
преди 6 месеца |
AlpinDale
|
a3b56353fa
fix: another one missed
|
преди 6 месеца |
AlpinDale
|
4e4cd55d30
fix: incorrect LoRA import
|
преди 6 месеца |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
преди 6 месеца |
AlpinDale
|
1cb06835a0
fix: TPU multimodal kwargs and outlines installation in TPU docker
|
преди 6 месеца |
AlpinDale
|
1562e073c6
fix: ray worker rank assigment
|
преди 6 месеца |
AlpinDale
|
1a40bf438b
fix: incorrect gpu capability when used mixed gpus
|
преди 6 месеца |
AlpinDale
|
3798ecc309
chore: add flashinfer to default dockerfile
|
преди 6 месеца |
AlpinDale
|
ebba0d9226
fix: mamba cache cuda graph padding
|
преди 6 месеца |
AlpinDale
|
c25a9abb28
fix: outlines failing on second launch
|
преди 6 месеца |
AlpinDale
|
2105e4fd6b
feat: correctly invoke prefill & decode kernels for cross-attention
|
преди 6 месеца |
AlpinDale
|
3e7d5f7d14
chore: reloading fused_moe config on the last chunk
|
преди 6 месеца |
AlpinDale
|
88a638d793
chore: debug logs for all available endpoints
|
преди 6 месеца |
AlpinDale
|
98cb1c4cd1
feat: support fp8 via `llm-compressor`
|
преди 6 месеца |
AlpinDale
|
bf4f113ef1
feat: add paligemma vision model support
|
преди 6 месеца |
AlpinDale
|
7e99578712
fix: cleanup validation and update docs for vlm
|
преди 6 месеца |
AlpinDale
|
526163003d
fix: improve consistency between feature size calc and dummy data for profiling
|
преди 6 месеца |
AlpinDale
|
c11a8bdaad
fix: calculate max number of multi-modal tokens automatically
|
преди 6 месеца |
AlpinDale
|
5761ef8c35
feat: gemma-2 support
|
преди 6 месеца |
AlpinDale
|
151d782233
fix: attention softcapping for flashinfer
|
преди 6 месеца |
AlpinDale
|
a5fafaa9ce
chore: add more tuning for the CPU backend via intel-openmp
|
преди 6 месеца |
Pyroserenus
|
ba7760d1f9
Update Klite.embd (#588)
|
преди 6 месеца |
AlpinDale
|
27a28fae05
chore: enable alibi for rocm flash attention
|
преди 6 месеца |
AlpinDale
|
4c3bb0b436
fix: pipeline parallel on python 3.8 and 3.9
|
преди 6 месеца |
AlpinDale
|
0061aea5d5
fix: prevent contention amongst shards by setting OMP_NUM_THREADS=1
|
преди 6 месеца |
AlpinDale
|
1ff6d4c3d7
feat: support pipeline parallel on indivisible GPU count (#587)
|
преди 6 месеца |
AlpinDale
|
6e561ecda9
chore: clean up `CompressedTensorsW8A8`
|
преди 6 месеца |
AlpinDale
|
4f7d212b70
feat: remove vision language config
|
преди 6 месеца |