AlpinDale
|
7e1d2c9feb
fix: add images/ to gitignore
|
7 月之前 |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
7 月之前 |
AlpinDale
|
00acf371f9
rocm: fused topk softmax
|
7 月之前 |
AlpinDale
|
78de98463b
feat: return max_model_len in /v1/models
|
7 月之前 |
AlpinDale
|
8c61fb9c19
fix: prevent LLM.encode() to be used with causal models
|
7 月之前 |
AlpinDale
|
5fecc6b025
when was this deprecated?
|
7 月之前 |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
7 月之前 |
AlpinDale
|
0307da9e15
refactor: bitsandbytes -> autoquant
|
7 月之前 |
AlpinDale
|
f2c6791527
feat: update cutlass fp8 configs
|
7 月之前 |
AlpinDale
|
54f4f1e7f3
allow the cutlass kernels to take scales that reside on the GPU
|
7 月之前 |
AlpinDale
|
52474b8fa9
build: parallelize all build extensions
|
7 月之前 |
AlpinDale
|
67084aca5b
do not build cutlass kernels if cuda version is too low
|
7 月之前 |
AlpinDale
|
b029a544ff
optimize eager mode host time with numpy
|
7 月之前 |
AlpinDale
|
ced1b36b8b
feat: support head size of 192
|
7 月之前 |
AlpinDale
|
4ab4c5c87c
oops
|
7 月之前 |
AlpinDale
|
9e79a15b9f
fix: ignore warnings for sparseml
|
7 月之前 |
AlpinDale
|
d45c846c8c
do not build sm_90a for cuda 11
|
7 月之前 |
AlpinDale
|
08f639b8aa
remove duplicate seq_lens_tensor
|
7 月之前 |
AlpinDale
|
072aec1062
automatically detect sparseml models
|
7 月之前 |
AlpinDale
|
5cedee9024
fix gemma with gptq marlin
|
7 月之前 |
AlpinDale
|
9d19811d4f
avoid the nee dto pass `None` values to `Sequence.inputs`
|
7 月之前 |
AlpinDale
|
f2b7a42c4e
fix: async cancels in merge_async_iterators for python>=3.9
|
7 月之前 |
AlpinDale
|
9099040472
feat: cross-attention kv caching support
|
7 月之前 |
AlpinDale
|
b2fd915c35
improve p2p access check
|
7 月之前 |
AlpinDale
|
7194047318
remove vllm-nccl
|
7 月之前 |
AlpinDale
|
6785d78d82
fix: do not expose EOS token in the API
|
7 月之前 |
AlpinDale
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
7 月之前 |
AlpinDale
|
e4ea3da1ad
fix: tensor parallel with embedding model
|
7 月之前 |
AlpinDale
|
f40b809d3b
allow using v2 block manager with sliding window
|
7 月之前 |
AlpinDale
|
2649f3f14e
aqlm works on pascal
|
7 月之前 |