AlpinDale
|
b9b5e352cb
typo
|
3 months ago |
AlpinDale
|
0e08cb1c12
add ultravox config file
|
3 months ago |
AlpinDale
|
e81b8d1c52
move input parsing to utils
|
3 months ago |
AlpinDale
|
3693028340
feat: support for Audio modality (#698)
|
3 months ago |
AlpinDale
|
31483a7d3b
fix: manually install triton for other devices to prevent outlines errors (#697)
|
3 months ago |
AlpinDale
|
5d37ec1016
suppress tpu import warning (#696)
|
3 months ago |
AlpinDale
|
0e558e9b2f
fix: loading chameleon model with TP>1 (#695)
|
3 months ago |
AlpinDale
|
1d3a1fec47
feat: add load/unload endpoints for soft-prompts (#694)
|
3 months ago |
AlpinDale
|
c34a6ac8e4
feat: add lora loading/unloading api endpoint (#693)
|
3 months ago |
AlpinDale
|
7debd35ca2
fix: shut down ray dag workers cleanly (#692)
|
3 months ago |
AlpinDale
|
ec32f999bc
build: bump cmake to 3.26 (#691)
|
3 months ago |
AlpinDale
|
4fe371b7fa
fix: allow passing float for GiB arguments (#690)
|
3 months ago |
AlpinDale
|
6144150398
chore: use scalar type to dispatch to different `gptq_marlin` kernels (#689)
|
3 months ago |
AlpinDale
|
24456206a9
fix: logit softcapping in flash-attn (#688)
|
3 months ago |
AlpinDale
|
f3bfdfb923
chore: use public ECR for neuron image (#687)
|
3 months ago |
AlpinDale
|
3f712cd287
feat: add progress bar for loading individual weight modules (#640)
|
3 months ago |
AlpinDale
|
2573b36f6a
feat: allow image embeddings for VLM input (#686)
|
3 months ago |
AlpinDale
|
300f889554
chore: update flashinfer to v0.1.3 (#685)
|
3 months ago |
AlpinDale
|
4ca9aaaf3c
build: add empty device (#684)
|
3 months ago |
AlpinDale
|
b03fa02397
refactor: base worker input refactor for multi-step (#683)
|
3 months ago |
AlpinDale
|
8cfbe62a7c
chore: bump lmfe to v0.10.6 and include triton for tpu and xpu dockerfiles (#682)
|
3 months ago |
AlpinDale
|
06cd48ea5c
chore: use mark_dynamic to reduce TPU compile times (#681)
|
3 months ago |
AlpinDale
|
fa5553b20f
fix: phi3v batch inference with different aspect ratio images (#680)
|
3 months ago |
AlpinDale
|
79d603954e
fix: chunked prefill with v2 block manager (#679)
|
3 months ago |
AlpinDale
|
3bbb3f2086
feat: add numpy implementation of `compute_slot_mapping` (#678)
|
3 months ago |
AlpinDale
|
df208ab4e9
fix: fp8 checkpoints with fused linear modules (#677)
|
4 months ago |
AlpinDale
|
81fa31bcaf
feat: embeddings support for batched OAI endpoint (#676)
|
4 months ago |
AlpinDale
|
c2bb886b2e
fix: reinit procedure in `ModelInputForGPUBuilder` (#675)
|
4 months ago |
AlpinDale
|
bf88c8567e
feat: mamba model support (#674)
|
4 months ago |
AlpinDale
|
8583aefed7
chore: mamba cache single buffer (#673)
|
4 months ago |