AlpinDale
|
b029a544ff
optimize eager mode host time with numpy
|
7 months ago |
AlpinDale
|
ced1b36b8b
feat: support head size of 192
|
7 months ago |
AlpinDale
|
4ab4c5c87c
oops
|
7 months ago |
AlpinDale
|
9e79a15b9f
fix: ignore warnings for sparseml
|
7 months ago |
AlpinDale
|
d45c846c8c
do not build sm_90a for cuda 11
|
7 months ago |
AlpinDale
|
08f639b8aa
remove duplicate seq_lens_tensor
|
7 months ago |
AlpinDale
|
072aec1062
automatically detect sparseml models
|
7 months ago |
AlpinDale
|
5cedee9024
fix gemma with gptq marlin
|
7 months ago |
AlpinDale
|
9d19811d4f
avoid the nee dto pass `None` values to `Sequence.inputs`
|
7 months ago |
AlpinDale
|
f2b7a42c4e
fix: async cancels in merge_async_iterators for python>=3.9
|
7 months ago |
AlpinDale
|
9099040472
feat: cross-attention kv caching support
|
7 months ago |
AlpinDale
|
b2fd915c35
improve p2p access check
|
7 months ago |
AlpinDale
|
7194047318
remove vllm-nccl
|
7 months ago |
AlpinDale
|
6785d78d82
fix: do not expose EOS token in the API
|
7 months ago |
AlpinDale
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
7 months ago |
AlpinDale
|
e4ea3da1ad
fix: tensor parallel with embedding model
|
7 months ago |
AlpinDale
|
f40b809d3b
allow using v2 block manager with sliding window
|
7 months ago |
AlpinDale
|
2649f3f14e
aqlm works on pascal
|
7 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
7 months ago |
AlpinDale
|
344ddaac5a
properly disable speculative decoding
|
7 months ago |
AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
7 months ago |
AlpinDale
|
0d15aa3ab3
fix prefix caching for block manager v2
|
7 months ago |
AlpinDale
|
7d0884de9a
fix mistral v0.3 weight loading
|
7 months ago |
AlpinDale
|
e8b7f53321
allow prompt token IDs in the logits processor api
|
7 months ago |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
7 months ago |
Drake
|
e1a142c179
Fix OpenAI chat completions compatibility (#559)
|
7 months ago |
AlpinDale
|
5b0c11d190
support pipeline parallel pynccl groups
|
7 months ago |
AlpinDale
|
f6250c5516
move dockerfiles to root; fix cpu build
|
7 months ago |
AlpinDale
|
d8667fcb98
improve gptq_marlin_24 prefill performance
|
7 months ago |
AlpinDale
|
eb2c5c77df
feat: enforce the max possible seqlen
|
7 months ago |