AlpinDale
|
1441326ac8
fix: cleanup minicpm-v and port na_vit model
|
6 mesi fa |
AlpinDale
|
e3f07b22c3
feat: support for QQQ W4A8 quantization (#612)
|
6 mesi fa |
AlpinDale
|
d0378c617f
fix: logit processor exceeding vocab size
|
6 mesi fa |
AlpinDale
|
269e9aabda
fix: set readonly=True for non-root TPU devices
|
6 mesi fa |
AlpinDale
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
6 mesi fa |
AlpinDale
|
6b1fdd07bd
chore: add isort and refactor formatting script and utils
|
6 mesi fa |
AlpinDale
|
9fcf331f1b
feat: add yaml config parsing (#610)
|
6 mesi fa |
AlpinDale
|
d8ddc230b6
fix: formatting
|
6 mesi fa |
ewof
|
544c020eb6
chore: sort args (#608)
|
6 mesi fa |
AlpinDale
|
ce00ca628c
fix: wrap all outlines imports
|
6 mesi fa |
AlpinDale
|
758aef4c17
fix: conditionally import outlines.caching
|
6 mesi fa |
AlpinDale
|
848731f527
chore: add punica sizes for mistral nemo
|
6 mesi fa |
AlpinDale
|
406647ad1f
fix: remove artifact
|
6 mesi fa |
AlpinDale
|
1d8616e4f7
fix: massively improve throughput with high number of prompts
|
6 mesi fa |
AlpinDale
|
869ad77843
fix: remove scaled_fp8_quant_kernel padding footgun
|
6 mesi fa |
AlpinDale
|
c85ae34877
chore: bump openvino toolkit to pre-release
|
6 mesi fa |
AlpinDale
|
dc1b59df9c
fix: compiler warnings for _C and _moe
|
6 mesi fa |
AlpinDale
|
d8a51d05a7
fix: seeded gens with pipeline parallel
|
6 mesi fa |
AlpinDale
|
9d66a933f2
fix: paligemma mmp
|
6 mesi fa |
AlpinDale
|
eef647deab
fix: greedy decoding in TPU
|
6 mesi fa |
AlpinDale
|
fb22ae6d49
chore: tune int8 kernels for ada lovelace
|
6 mesi fa |
AlpinDale
|
49a2836d61
fix: divide-by-zero warnings in marlin kernels
|
6 mesi fa |
AlpinDale
|
04efb16716
fix: unused variables in awq gemm kernel
|
6 mesi fa |
AlpinDale
|
8d3fb94679
feat: add allowed_token_ids
|
6 mesi fa |
AlpinDale
|
4abbbdad78
chore: make triton fully optional
|
6 mesi fa |
AlpinDale
|
2a042fd7b4
fix: remove timm as a hardcoded requirement
|
6 mesi fa |
AlpinDale
|
fbec255dc1
chore: enable tpu tensor parallel in async engine
|
6 mesi fa |
AlpinDale
|
e8d34d75e6
fix: deprecation warnings in squeezellm quant_cuda_kernel
|
6 mesi fa |
AlpinDale
|
c81023a90a
fix: reduce unnecessary compute when logprobs=None
|
6 mesi fa |
AlpinDale
|
682a9db0ed
chore: tune fp8 kernels for ada lovelace cards
|
6 mesi fa |