Commit History

Autor SHA1 Mensaxe Data
  AlpinDale f5bbf07c90 chore: use the `compressed-tensors` library to avoid code reuse (#704) hai 6 meses
  AlpinDale 2d044af0e1 chore: spawn engine process from api server process (#703) hai 6 meses
  AlpinDale edec2e9a9e feat: migrate awq and awq_marlin to AphroditeParameter (#702) hai 6 meses
  AlpinDale 0c6f03b7e4 feat: add Solar model support (#701) hai 6 meses
  AlpinDale 4ec08af18b chore: update fused MoE weight loading (#700) hai 6 meses
  AlpinDale 4f6020cc86 chore: migrate gptq_marlin to AphroditeParameters (#699) hai 6 meses
  AlpinDale 3693028340 feat: support for Audio modality (#698) hai 6 meses
  AlpinDale 31483a7d3b fix: manually install triton for other devices to prevent outlines errors (#697) hai 6 meses
  AlpinDale 5d37ec1016 suppress tpu import warning (#696) hai 6 meses
  AlpinDale 0e558e9b2f fix: loading chameleon model with TP>1 (#695) hai 6 meses
  AlpinDale 1d3a1fec47 feat: add load/unload endpoints for soft-prompts (#694) hai 6 meses
  AlpinDale c34a6ac8e4 feat: add lora loading/unloading api endpoint (#693) hai 6 meses
  AlpinDale 7debd35ca2 fix: shut down ray dag workers cleanly (#692) hai 6 meses
  AlpinDale ec32f999bc build: bump cmake to 3.26 (#691) hai 6 meses
  AlpinDale 4fe371b7fa fix: allow passing float for GiB arguments (#690) hai 6 meses
  AlpinDale 6144150398 chore: use scalar type to dispatch to different `gptq_marlin` kernels (#689) hai 6 meses
  AlpinDale 24456206a9 fix: logit softcapping in flash-attn (#688) hai 6 meses
  AlpinDale f3bfdfb923 chore: use public ECR for neuron image (#687) hai 6 meses
  AlpinDale 3f712cd287 feat: add progress bar for loading individual weight modules (#640) hai 6 meses
  AlpinDale 2573b36f6a feat: allow image embeddings for VLM input (#686) hai 6 meses
  AlpinDale 300f889554 chore: update flashinfer to v0.1.3 (#685) hai 6 meses
  AlpinDale 4ca9aaaf3c build: add empty device (#684) hai 6 meses
  AlpinDale b03fa02397 refactor: base worker input refactor for multi-step (#683) hai 6 meses
  AlpinDale 8cfbe62a7c chore: bump lmfe to v0.10.6 and include triton for tpu and xpu dockerfiles (#682) hai 6 meses
  AlpinDale 06cd48ea5c chore: use mark_dynamic to reduce TPU compile times (#681) hai 6 meses
  AlpinDale fa5553b20f fix: phi3v batch inference with different aspect ratio images (#680) hai 6 meses
  AlpinDale 79d603954e fix: chunked prefill with v2 block manager (#679) hai 6 meses
  AlpinDale 3bbb3f2086 feat: add numpy implementation of `compute_slot_mapping` (#678) hai 6 meses
  AlpinDale df208ab4e9 fix: fp8 checkpoints with fused linear modules (#677) hai 6 meses
  AlpinDale 81fa31bcaf feat: embeddings support for batched OAI endpoint (#676) hai 6 meses