AlpinDale
|
561588d97d
wip
|
3 months ago |
AlpinDale
|
5630aa378b
protocol and abstract class
|
3 months ago |
AlpinDale
|
8e0d376f1c
ci: bump aphrodite to 0.6.1 (#722)
|
3 months ago |
AlpinDale
|
12e40ae6fd
chore: update grafana template (#721)
|
3 months ago |
AlpinDale
|
61c7182491
feat: enable prompt logprobs in OpenAI API (#720)
|
3 months ago |
AlpinDale
|
28b6397188
chore: quant config for speculative draft models (#719)
|
3 months ago |
AlpinDale
|
8e22069c9e
fix: weight loading for scalars (#718)
|
3 months ago |
AlpinDale
|
d289c3855b
fix: install protobuf for cpu (#716)
|
3 months ago |
AlpinDale
|
008e646c7e
chore: add support for up to 2048 block size (#715)
|
3 months ago |
AlpinDale
|
1c519cc6ac
chore: set per-rank XLA cache for TPU (#714)
|
3 months ago |
AlpinDale
|
577586309d
chore: multi-step args and sequence modifications (#713)
|
3 months ago |
AlpinDale
|
0b8b407b6d
feat: support profiling with multiple multi-modal inputs per prompt (#712)
|
3 months ago |
AlpinDale
|
d5033e12fd
feat: implement mistral tokenizer mode (#711)
|
3 months ago |
AlpinDale
|
ebf01d665b
fix: disable embeddings API for chat models (#710)
|
3 months ago |
AlpinDale
|
198029295c
fix: empty sampler output when temperature is too low (#709)
|
3 months ago |
AlpinDale
|
8878e3a63f
fix: import ray under a guard (#708)
|
3 months ago |
AlpinDale
|
36241f98af
feat: add support for multi-host tpu (#707)
|
3 months ago |
AlpinDale
|
04da8c33bd
Revert "chore: use the `compressed-tensors` library to avoid code reuse (#704)" (#706)
|
3 months ago |
AlpinDale
|
f76f2a5af0
feat: add aphrodite plugin system (#705)
|
3 months ago |
AlpinDale
|
f5bbf07c90
chore: use the `compressed-tensors` library to avoid code reuse (#704)
|
3 months ago |
AlpinDale
|
2d044af0e1
chore: spawn engine process from api server process (#703)
|
3 months ago |
AlpinDale
|
edec2e9a9e
feat: migrate awq and awq_marlin to AphroditeParameter (#702)
|
3 months ago |
AlpinDale
|
0c6f03b7e4
feat: add Solar model support (#701)
|
3 months ago |
AlpinDale
|
4ec08af18b
chore: update fused MoE weight loading (#700)
|
3 months ago |
AlpinDale
|
4f6020cc86
chore: migrate gptq_marlin to AphroditeParameters (#699)
|
3 months ago |
AlpinDale
|
3693028340
feat: support for Audio modality (#698)
|
3 months ago |
AlpinDale
|
31483a7d3b
fix: manually install triton for other devices to prevent outlines errors (#697)
|
3 months ago |
AlpinDale
|
5d37ec1016
suppress tpu import warning (#696)
|
3 months ago |
AlpinDale
|
0e558e9b2f
fix: loading chameleon model with TP>1 (#695)
|
3 months ago |
AlpinDale
|
1d3a1fec47
feat: add load/unload endpoints for soft-prompts (#694)
|
3 months ago |