AlpinDale
|
008e646c7e
chore: add support for up to 2048 block size (#715)
|
3 months ago |
AlpinDale
|
1c519cc6ac
chore: set per-rank XLA cache for TPU (#714)
|
3 months ago |
AlpinDale
|
577586309d
chore: multi-step args and sequence modifications (#713)
|
3 months ago |
AlpinDale
|
0b8b407b6d
feat: support profiling with multiple multi-modal inputs per prompt (#712)
|
3 months ago |
AlpinDale
|
d5033e12fd
feat: implement mistral tokenizer mode (#711)
|
3 months ago |
AlpinDale
|
ebf01d665b
fix: disable embeddings API for chat models (#710)
|
3 months ago |
AlpinDale
|
198029295c
fix: empty sampler output when temperature is too low (#709)
|
3 months ago |
AlpinDale
|
8878e3a63f
fix: import ray under a guard (#708)
|
3 months ago |
AlpinDale
|
36241f98af
feat: add support for multi-host tpu (#707)
|
3 months ago |
AlpinDale
|
04da8c33bd
Revert "chore: use the `compressed-tensors` library to avoid code reuse (#704)" (#706)
|
3 months ago |
AlpinDale
|
f76f2a5af0
feat: add aphrodite plugin system (#705)
|
3 months ago |
AlpinDale
|
f5bbf07c90
chore: use the `compressed-tensors` library to avoid code reuse (#704)
|
3 months ago |
AlpinDale
|
2d044af0e1
chore: spawn engine process from api server process (#703)
|
3 months ago |
AlpinDale
|
edec2e9a9e
feat: migrate awq and awq_marlin to AphroditeParameter (#702)
|
3 months ago |
AlpinDale
|
0c6f03b7e4
feat: add Solar model support (#701)
|
3 months ago |
AlpinDale
|
4ec08af18b
chore: update fused MoE weight loading (#700)
|
3 months ago |
AlpinDale
|
4f6020cc86
chore: migrate gptq_marlin to AphroditeParameters (#699)
|
3 months ago |
AlpinDale
|
3693028340
feat: support for Audio modality (#698)
|
3 months ago |
AlpinDale
|
31483a7d3b
fix: manually install triton for other devices to prevent outlines errors (#697)
|
3 months ago |
AlpinDale
|
5d37ec1016
suppress tpu import warning (#696)
|
3 months ago |
AlpinDale
|
0e558e9b2f
fix: loading chameleon model with TP>1 (#695)
|
3 months ago |
AlpinDale
|
1d3a1fec47
feat: add load/unload endpoints for soft-prompts (#694)
|
4 months ago |
AlpinDale
|
c34a6ac8e4
feat: add lora loading/unloading api endpoint (#693)
|
4 months ago |
AlpinDale
|
7debd35ca2
fix: shut down ray dag workers cleanly (#692)
|
4 months ago |
AlpinDale
|
ec32f999bc
build: bump cmake to 3.26 (#691)
|
4 months ago |
AlpinDale
|
4fe371b7fa
fix: allow passing float for GiB arguments (#690)
|
4 months ago |
AlpinDale
|
6144150398
chore: use scalar type to dispatch to different `gptq_marlin` kernels (#689)
|
4 months ago |
AlpinDale
|
24456206a9
fix: logit softcapping in flash-attn (#688)
|
4 months ago |
AlpinDale
|
f3bfdfb923
chore: use public ECR for neuron image (#687)
|
4 months ago |
AlpinDale
|
3f712cd287
feat: add progress bar for loading individual weight modules (#640)
|
4 months ago |