Commit History

Autor SHA1 Mensaxe Data
  AlpinDale 4e4cd55d30 fix: incorrect LoRA import hai 7 meses
  AlpinDale 99680b2d23 feat: soft prompts (#589) hai 7 meses
  AlpinDale 1cb06835a0 fix: TPU multimodal kwargs and outlines installation in TPU docker hai 7 meses
  AlpinDale 1562e073c6 fix: ray worker rank assigment hai 7 meses
  AlpinDale 1a40bf438b fix: incorrect gpu capability when used mixed gpus hai 7 meses
  AlpinDale 3798ecc309 chore: add flashinfer to default dockerfile hai 7 meses
  AlpinDale ebba0d9226 fix: mamba cache cuda graph padding hai 7 meses
  AlpinDale c25a9abb28 fix: outlines failing on second launch hai 7 meses
  AlpinDale 2105e4fd6b feat: correctly invoke prefill & decode kernels for cross-attention hai 7 meses
  AlpinDale 3e7d5f7d14 chore: reloading fused_moe config on the last chunk hai 7 meses
  AlpinDale 88a638d793 chore: debug logs for all available endpoints hai 7 meses
  AlpinDale 98cb1c4cd1 feat: support fp8 via `llm-compressor` hai 7 meses
  AlpinDale bf4f113ef1 feat: add paligemma vision model support hai 7 meses
  AlpinDale 7e99578712 fix: cleanup validation and update docs for vlm hai 7 meses
  AlpinDale 526163003d fix: improve consistency between feature size calc and dummy data for profiling hai 7 meses
  AlpinDale c11a8bdaad fix: calculate max number of multi-modal tokens automatically hai 7 meses
  AlpinDale 5761ef8c35 feat: gemma-2 support hai 7 meses
  AlpinDale 151d782233 fix: attention softcapping for flashinfer hai 7 meses
  AlpinDale a5fafaa9ce chore: add more tuning for the CPU backend via intel-openmp hai 7 meses
  Pyroserenus ba7760d1f9 Update Klite.embd (#588) hai 7 meses
  AlpinDale 27a28fae05 chore: enable alibi for rocm flash attention hai 7 meses
  AlpinDale 4c3bb0b436 fix: pipeline parallel on python 3.8 and 3.9 hai 7 meses
  AlpinDale 0061aea5d5 fix: prevent contention amongst shards by setting OMP_NUM_THREADS=1 hai 7 meses
  AlpinDale 1ff6d4c3d7 feat: support pipeline parallel on indivisible GPU count (#587) hai 7 meses
  AlpinDale 6e561ecda9 chore: clean up `CompressedTensorsW8A8` hai 7 meses
  AlpinDale 4f7d212b70 feat: remove vision language config hai 7 meses
  AlpinDale bdf1cc1aec fix: allow using custom all reduce when pp_size > 1 hai 7 meses
  AlpinDale ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs hai 7 meses
  AlpinDale 5257ebce8c fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED hai 7 meses
  AlpinDale 5240c0da23 fix: avoid unnecessary ray import warnings hai 7 meses