AlpinDale
|
237fa59aea
feat: support CPU/GPU swapping in BlockManagerV2
|
7 hónapja |
AlpinDale
|
ba02fb65c9
fix: pos encodings for CPU
|
7 hónapja |
AlpinDale
|
90bafca8e3
fix: cuda graphs with sparseml quants
|
7 hónapja |
AlpinDale
|
89ee54dcff
update dockerfile and enhance serving benchmark
|
7 hónapja |
AlpinDale
|
75f97bc25d
bump flash-attn to remove unnecessary copies in the backend
|
7 hónapja |
AlpinDale
|
7e1d2c9feb
fix: add images/ to gitignore
|
7 hónapja |
AlpinDale
|
8d77c69cbd
feat: support image processor and add llava example
|
7 hónapja |
AlpinDale
|
00acf371f9
rocm: fused topk softmax
|
7 hónapja |
AlpinDale
|
78de98463b
feat: return max_model_len in /v1/models
|
7 hónapja |
AlpinDale
|
8c61fb9c19
fix: prevent LLM.encode() to be used with causal models
|
7 hónapja |
AlpinDale
|
5fecc6b025
when was this deprecated?
|
7 hónapja |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
7 hónapja |
AlpinDale
|
0307da9e15
refactor: bitsandbytes -> autoquant
|
7 hónapja |
AlpinDale
|
f2c6791527
feat: update cutlass fp8 configs
|
7 hónapja |
AlpinDale
|
54f4f1e7f3
allow the cutlass kernels to take scales that reside on the GPU
|
7 hónapja |
AlpinDale
|
52474b8fa9
build: parallelize all build extensions
|
7 hónapja |
AlpinDale
|
67084aca5b
do not build cutlass kernels if cuda version is too low
|
7 hónapja |
AlpinDale
|
b029a544ff
optimize eager mode host time with numpy
|
7 hónapja |
AlpinDale
|
ced1b36b8b
feat: support head size of 192
|
7 hónapja |
AlpinDale
|
4ab4c5c87c
oops
|
7 hónapja |
AlpinDale
|
9e79a15b9f
fix: ignore warnings for sparseml
|
7 hónapja |
AlpinDale
|
d45c846c8c
do not build sm_90a for cuda 11
|
7 hónapja |
AlpinDale
|
08f639b8aa
remove duplicate seq_lens_tensor
|
7 hónapja |
AlpinDale
|
072aec1062
automatically detect sparseml models
|
7 hónapja |
AlpinDale
|
5cedee9024
fix gemma with gptq marlin
|
7 hónapja |
AlpinDale
|
9d19811d4f
avoid the nee dto pass `None` values to `Sequence.inputs`
|
7 hónapja |
AlpinDale
|
f2b7a42c4e
fix: async cancels in merge_async_iterators for python>=3.9
|
7 hónapja |
AlpinDale
|
9099040472
feat: cross-attention kv caching support
|
7 hónapja |
AlpinDale
|
b2fd915c35
improve p2p access check
|
7 hónapja |
AlpinDale
|
7194047318
remove vllm-nccl
|
7 hónapja |