AlpinDale
|
7debd35ca2
fix: shut down ray dag workers cleanly (#692)
|
4 months ago |
AlpinDale
|
ec32f999bc
build: bump cmake to 3.26 (#691)
|
4 months ago |
AlpinDale
|
4fe371b7fa
fix: allow passing float for GiB arguments (#690)
|
4 months ago |
AlpinDale
|
6144150398
chore: use scalar type to dispatch to different `gptq_marlin` kernels (#689)
|
4 months ago |
AlpinDale
|
24456206a9
fix: logit softcapping in flash-attn (#688)
|
4 months ago |
AlpinDale
|
f3bfdfb923
chore: use public ECR for neuron image (#687)
|
4 months ago |
AlpinDale
|
3f712cd287
feat: add progress bar for loading individual weight modules (#640)
|
4 months ago |
AlpinDale
|
2573b36f6a
feat: allow image embeddings for VLM input (#686)
|
4 months ago |
AlpinDale
|
300f889554
chore: update flashinfer to v0.1.3 (#685)
|
4 months ago |
AlpinDale
|
4ca9aaaf3c
build: add empty device (#684)
|
4 months ago |
AlpinDale
|
b03fa02397
refactor: base worker input refactor for multi-step (#683)
|
4 months ago |
AlpinDale
|
8cfbe62a7c
chore: bump lmfe to v0.10.6 and include triton for tpu and xpu dockerfiles (#682)
|
4 months ago |
AlpinDale
|
06cd48ea5c
chore: use mark_dynamic to reduce TPU compile times (#681)
|
4 months ago |
AlpinDale
|
fa5553b20f
fix: phi3v batch inference with different aspect ratio images (#680)
|
4 months ago |
AlpinDale
|
79d603954e
fix: chunked prefill with v2 block manager (#679)
|
4 months ago |
AlpinDale
|
3bbb3f2086
feat: add numpy implementation of `compute_slot_mapping` (#678)
|
4 months ago |
AlpinDale
|
df208ab4e9
fix: fp8 checkpoints with fused linear modules (#677)
|
4 months ago |
AlpinDale
|
81fa31bcaf
feat: embeddings support for batched OAI endpoint (#676)
|
4 months ago |
AlpinDale
|
c2bb886b2e
fix: reinit procedure in `ModelInputForGPUBuilder` (#675)
|
4 months ago |
AlpinDale
|
bf88c8567e
feat: mamba model support (#674)
|
4 months ago |
AlpinDale
|
8583aefed7
chore: mamba cache single buffer (#673)
|
4 months ago |
AlpinDale
|
19ad952dd4
chore: better stream termination in async engine (#672)
|
4 months ago |
AlpinDale
|
1394008421
chore: decouple `should_modify_greedy_probs_inplace (#671)
|
4 months ago |
AlpinDale
|
2da6a3ec2b
feat: option to apply temperature scaling last (#670)
|
4 months ago |
AlpinDale
|
e3a53712f2
fix: mlpspeculator with padded vocab (#669)
|
4 months ago |
AlpinDale
|
e200775863
feat: enable using fp8 kv and prefix caching with chunked prefill (#668)
|
4 months ago |
AlpinDale
|
ef40c05cd3
fix: minor adjustments to scheduler and block manager (#667)
|
4 months ago |
AlpinDale
|
7df7b8ca53
optimization: reduce end-to-end overhead from python obj allocation (#666)
|
4 months ago |
AlpinDale
|
ea78357d70
fix: deps with TPU dockerfile (#665)
|
4 months ago |
AlpinDale
|
62111fab17
feat: allow serving encoder-decoder models in the API server (#664)
|
4 months ago |