AlpinDale
|
c1c37c755d
bump version to 0.6.0
|
4 months ago |
AlpinDale
|
75122b20ef
chore: refactor wheel build script
|
4 months ago |
AlpinDale
|
638784c3c9
docs: fix typos
|
4 months ago |
AlpinDale
|
cc7a636ffd
ci: add action for deploying docs
|
4 months ago |
AlpinDale
|
57968a5053
docs: finalize User & Developer Documentation for Release Candidate (#618)
|
4 months ago |
AlpinDale
|
208cd5405f
fix: cpu offloading with gptq
|
4 months ago |
AlpinDale
|
28946766fb
fix: allow loading GGUF model without .gguf extension
|
4 months ago |
AlpinDale
|
2b85ffb1a5
chore: minor cleanups
|
4 months ago |
AlpinDale
|
2424207fac
ci: remove isort
|
4 months ago |
AlpinDale
|
c4933b1a6d
ci: remove yapf from the formatting script
|
4 months ago |
AlpinDale
|
616de67ff5
ci: remove yapf
|
4 months ago |
AlpinDale
|
f18eeaf59a
ci: codespell fixes
|
4 months ago |
AlpinDale
|
4d4e767838
ci: take one of fixing lint issues
|
4 months ago |
AlpinDale
|
6b1f96586b
ci: remove clang-format
|
4 months ago |
AlpinDale
|
dc00aa7b17
ci: a few more ignores
|
4 months ago |
AlpinDale
|
e63be8e46c
minor CI fixes
|
4 months ago |
AlpinDale
|
0e6c400b13
feat: re-add GGUF (#600)
|
4 months ago |
AlpinDale
|
6c1eab6a6c
feat: non-blocking transfer in prepare_input
|
4 months ago |
AlpinDale
|
2a349ca3e1
fix: specify device when loading lora and embedding tensors
|
4 months ago |
AlpinDale
|
9d98f29b3a
chore: update cutlass to 3.5.1
|
4 months ago |
AlpinDale
|
bd210a6cf6
fix: use args.trust_remote_code
|
4 months ago |
AlpinDale
|
e8008f24ed
fix: use ipv4 localhost form for zmq bind
|
4 months ago |
AlpinDale
|
6c2e24de53
fix: support flashinfer for draft model runner
|
4 months ago |
AlpinDale
|
edffcecc67
chore: add proper logging for spec decoding verification
|
4 months ago |
AlpinDale
|
c3ee71a437
feat: port SiglipVisionModel from transformers
|
4 months ago |
AlpinDale
|
040e5af52b
refactor: factor out code for running uvicorn again
|
4 months ago |
AlpinDale
|
9a50e3b4eb
refactor: minicpmv and port Idefix2VisionTransformer
|
4 months ago |
AlpinDale
|
29f0478f90
chore: simplify output processing with shortcut for non-parallel sampling and non-beam search usecase (#616)
|
4 months ago |
AlpinDale
|
b6c97e4d16
feat: add guided decoding to LLM
|
4 months ago |
AlpinDale
|
212b9d8a03
refactor: add has_prefix_cache_hit flag to FlashAttentionMetadataBuilder
|
4 months ago |