AlpinDale
|
4e4cd55d30
fix: incorrect LoRA import
|
hai 7 meses |
AlpinDale
|
99680b2d23
feat: soft prompts (#589)
|
hai 7 meses |
AlpinDale
|
1cb06835a0
fix: TPU multimodal kwargs and outlines installation in TPU docker
|
hai 7 meses |
AlpinDale
|
1562e073c6
fix: ray worker rank assigment
|
hai 7 meses |
AlpinDale
|
1a40bf438b
fix: incorrect gpu capability when used mixed gpus
|
hai 7 meses |
AlpinDale
|
3798ecc309
chore: add flashinfer to default dockerfile
|
hai 7 meses |
AlpinDale
|
ebba0d9226
fix: mamba cache cuda graph padding
|
hai 7 meses |
AlpinDale
|
c25a9abb28
fix: outlines failing on second launch
|
hai 7 meses |
AlpinDale
|
2105e4fd6b
feat: correctly invoke prefill & decode kernels for cross-attention
|
hai 7 meses |
AlpinDale
|
3e7d5f7d14
chore: reloading fused_moe config on the last chunk
|
hai 7 meses |
AlpinDale
|
88a638d793
chore: debug logs for all available endpoints
|
hai 7 meses |
AlpinDale
|
98cb1c4cd1
feat: support fp8 via `llm-compressor`
|
hai 7 meses |
AlpinDale
|
bf4f113ef1
feat: add paligemma vision model support
|
hai 7 meses |
AlpinDale
|
7e99578712
fix: cleanup validation and update docs for vlm
|
hai 7 meses |
AlpinDale
|
526163003d
fix: improve consistency between feature size calc and dummy data for profiling
|
hai 7 meses |
AlpinDale
|
c11a8bdaad
fix: calculate max number of multi-modal tokens automatically
|
hai 7 meses |
AlpinDale
|
5761ef8c35
feat: gemma-2 support
|
hai 7 meses |
AlpinDale
|
151d782233
fix: attention softcapping for flashinfer
|
hai 7 meses |
AlpinDale
|
a5fafaa9ce
chore: add more tuning for the CPU backend via intel-openmp
|
hai 7 meses |
Pyroserenus
|
ba7760d1f9
Update Klite.embd (#588)
|
hai 7 meses |
AlpinDale
|
27a28fae05
chore: enable alibi for rocm flash attention
|
hai 7 meses |
AlpinDale
|
4c3bb0b436
fix: pipeline parallel on python 3.8 and 3.9
|
hai 7 meses |
AlpinDale
|
0061aea5d5
fix: prevent contention amongst shards by setting OMP_NUM_THREADS=1
|
hai 7 meses |
AlpinDale
|
1ff6d4c3d7
feat: support pipeline parallel on indivisible GPU count (#587)
|
hai 7 meses |
AlpinDale
|
6e561ecda9
chore: clean up `CompressedTensorsW8A8`
|
hai 7 meses |
AlpinDale
|
4f7d212b70
feat: remove vision language config
|
hai 7 meses |
AlpinDale
|
bdf1cc1aec
fix: allow using custom all reduce when pp_size > 1
|
hai 7 meses |
AlpinDale
|
ad24e74a99
feat: FP8 weight-only quantization support for Ampere GPUs
|
hai 7 meses |
AlpinDale
|
5257ebce8c
fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED
|
hai 7 meses |
AlpinDale
|
5240c0da23
fix: avoid unnecessary ray import warnings
|
hai 7 meses |