AlpinDale
|
4d4e767838
ci: take one of fixing lint issues
|
4 months ago |
AlpinDale
|
0e6c400b13
feat: re-add GGUF (#600)
|
4 months ago |
AlpinDale
|
c9310eeb02
fix: skip loading lm_head for tie_word_embeddings models
|
4 months ago |
AlpinDale
|
1441326ac8
fix: cleanup minicpm-v and port na_vit model
|
4 months ago |
AlpinDale
|
e348ca3540
feat: add support for MiniCPM-V
|
4 months ago |
AlpinDale
|
cb44c8daa8
feat: support FP8 KV Cache scales from compressed-tensors
|
4 months ago |
AlpinDale
|
00503b9fc1
feat: non-uniform quantization via `compressed-tensors` for llama
|
4 months ago |
AlpinDale
|
0429cb2229
fix: only create embeddings and lm_head when necessary for PP
|
4 months ago |
AlpinDale
|
5289c14b24
feat: Asymmetric Tensor Parallel (#594)
|
4 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
4 months ago |
AlpinDale
|
497bf64942
chore: simplify pipeline parallel code in llama
|
4 months ago |
AlpinDale
|
0f4a9ee77b
quantized lm_head (#582)
|
4 months ago |
AlpinDale
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
4 months ago |
AlpinDale
|
c5d8028668
fix: no need to redefine supports_vision and supports_lora in model class
|
5 months ago |
AlpinDale
|
56e0b8223c
chore: add base class for LoRA-supported models
|
5 months ago |
AlpinDale
|
690110a051
feat: bitsandbytes quantization
|
5 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
5 months ago |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
5 months ago |
AlpinDale
|
c1ed789835
fix: typo in llama.py
|
5 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
5 months ago |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
5 months ago |
AlpinDale
|
2ecfa98da9
re-fix mistral nemo
|
5 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
5 months ago |
AlpinDale
|
54a4cef647
add bias and tie word embedding support for llama
|
5 months ago |
AlpinDale
|
639e48e47d
fix: mistral nemo
|
5 months ago |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
5 months ago |
AlpinDale
|
e7b1368156
feat: Phi3 support
|
6 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |