david/aphrodite-engine

Autor	SHA1 Mensaxe	Data
AlpinDale	4e4cd55d30 fix: incorrect LoRA import	hai 7 meses
AlpinDale	99680b2d23 feat: soft prompts (#589)	hai 7 meses
AlpinDale	1cb06835a0 fix: TPU multimodal kwargs and outlines installation in TPU docker	hai 7 meses
AlpinDale	1562e073c6 fix: ray worker rank assigment	hai 7 meses
AlpinDale	1a40bf438b fix: incorrect gpu capability when used mixed gpus	hai 7 meses
AlpinDale	3798ecc309 chore: add flashinfer to default dockerfile	hai 7 meses
AlpinDale	ebba0d9226 fix: mamba cache cuda graph padding	hai 7 meses
AlpinDale	c25a9abb28 fix: outlines failing on second launch	hai 7 meses
AlpinDale	2105e4fd6b feat: correctly invoke prefill & decode kernels for cross-attention	hai 7 meses
AlpinDale	3e7d5f7d14 chore: reloading fused_moe config on the last chunk	hai 7 meses
AlpinDale	88a638d793 chore: debug logs for all available endpoints	hai 7 meses
AlpinDale	98cb1c4cd1 feat: support fp8 via `llm-compressor`	hai 7 meses
AlpinDale	bf4f113ef1 feat: add paligemma vision model support	hai 7 meses
AlpinDale	7e99578712 fix: cleanup validation and update docs for vlm	hai 7 meses
AlpinDale	526163003d fix: improve consistency between feature size calc and dummy data for profiling	hai 7 meses
AlpinDale	c11a8bdaad fix: calculate max number of multi-modal tokens automatically	hai 7 meses
AlpinDale	5761ef8c35 feat: gemma-2 support	hai 7 meses
AlpinDale	151d782233 fix: attention softcapping for flashinfer	hai 7 meses
AlpinDale	a5fafaa9ce chore: add more tuning for the CPU backend via intel-openmp	hai 7 meses
Pyroserenus	ba7760d1f9 Update Klite.embd (#588)	hai 7 meses
AlpinDale	27a28fae05 chore: enable alibi for rocm flash attention	hai 7 meses
AlpinDale	4c3bb0b436 fix: pipeline parallel on python 3.8 and 3.9	hai 7 meses
AlpinDale	0061aea5d5 fix: prevent contention amongst shards by setting OMP_NUM_THREADS=1	hai 7 meses
AlpinDale	1ff6d4c3d7 feat: support pipeline parallel on indivisible GPU count (#587)	hai 7 meses
AlpinDale	6e561ecda9 chore: clean up `CompressedTensorsW8A8`	hai 7 meses
AlpinDale	4f7d212b70 feat: remove vision language config	hai 7 meses
AlpinDale	bdf1cc1aec fix: allow using custom all reduce when pp_size > 1	hai 7 meses
AlpinDale	ad24e74a99 feat: FP8 weight-only quantization support for Ampere GPUs	hai 7 meses
AlpinDale	5257ebce8c fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED	hai 7 meses
AlpinDale	5240c0da23 fix: avoid unnecessary ray import warnings	hai 7 meses

Posterior Anterior

Commit History Buscar

Commit History