AlpinDale
|
7253e9052d
feat: integrate typical acceptance sampling for spec decoding
|
5 months ago |
AlpinDale
|
b8a19ba27f
chore: extend aphrodite metrics logging api
|
5 months ago |
AlpinDale
|
bbde979ecd
DeepSeek-V2 (#579)
|
5 months ago |
AlpinDale
|
b8650ec51d
fix: better error message for MLPSpeculator
|
5 months ago |
AlpinDale
|
0886c361f4
feat: OpenVINO CPU backend (#576)
|
5 months ago |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
5 months ago |
AlpinDale
|
51cfadeb29
fix: `MLPSpeculator` handling of `num_speculative_tokens`
|
5 months ago |
AlpinDale
|
2c321ce1f2
chore: upgrade to rocm 6.1, update docker
|
5 months ago |
AlpinDale
|
80ac1cdc8f
fix: add args for the draft tp
|
5 months ago |
AlpinDale
|
af43576da0
feat: add MLPSpeculator speculative decoding support (#572)
|
5 months ago |
AlpinDale
|
0613d91551
fix: kv head calculation with MPT GQA
|
5 months ago |
AlpinDale
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
5 months ago |
AlpinDale
|
e4407bbcb7
fix: do not start a ray cluster when not using ray
|
5 months ago |
AlpinDale
|
ee174ea4fd
fix: guard for lora + chunked prefill
|
5 months ago |
AlpinDale
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
5 months ago |
AlpinDale
|
06ed127441
fix: do not raise optimization warning for fp8 quant
|
5 months ago |
AlpinDale
|
fe21123a1c
feat: TPU support (#570)
|
5 months ago |
AlpinDale
|
fa58ba87a3
fix: only set executor backend to mp if not multi-node
|
5 months ago |
AlpinDale
|
bba89fc6d3
chore: make the automatic rope scaling behave properly with rope_scaling arg, add rope theta
|
5 months ago |
AlpinDale
|
517676249c
chore: update the compressed-tensors config
|
5 months ago |
AlpinDale
|
76d6f49bbb
fix: modelscope downloads
|
5 months ago |
AlpinDale
|
f2e94e2184
chore: minor llava cleanups in preparation for llava-next
|
5 months ago |
AlpinDale
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
5 months ago |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
5 months ago |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
5 months ago |
AlpinDale
|
0307da9e15
refactor: bitsandbytes -> autoquant
|
5 months ago |
AlpinDale
|
072aec1062
automatically detect sparseml models
|
5 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
5 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
6 months ago |
AlpinDale
|
60e74e92fd
add rope_scaling arg
|
6 months ago |