AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
6 months ago |
AlpinDale
|
f4ea11b982
feat: initial support for activation quantization
|
6 months ago |
AlpinDale
|
c1ed789835
fix: typo in llama.py
|
6 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
6 months ago |
AlpinDale
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
6 months ago |
AlpinDale
|
2ecfa98da9
re-fix mistral nemo
|
6 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
6 months ago |
AlpinDale
|
54a4cef647
add bias and tie word embedding support for llama
|
6 months ago |
AlpinDale
|
639e48e47d
fix: mistral nemo
|
6 months ago |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
6 months ago |
AlpinDale
|
e7b1368156
feat: Phi3 support
|
7 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
7 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
10 months ago |
sgsdxzy
|
6ebac34dc1
chore: cleaner pre-llamafied Yi implementation (#352)
|
10 months ago |
AlpinDale
|
681e94611f
fix: restore backwards compatibility with old Yi models (#351)
|
10 months ago |
AlpinDale
|
da223153c6
feat&fix: cohere support and missing GPU blocks (#333)
|
10 months ago |
AlpinDale
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
10 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
11 months ago |
AlpinDale
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
11 months ago |
AlpinDale
|
842912d022
feat: on-the-fly gguf conversion (#250)
|
1 year ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 year ago |
AlpinDale
|
c0aac15421
feat: S-LoRA support (#222)
|
1 year ago |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
1 year ago |
AlpinDale
|
f013d714c0
chore: merge dev branch into main (#177)
|
1 year ago |
AlpinDale
|
2755a48d51
merge dev branch into main (#153)
|
1 year ago |
AlpinDale
|
887e03669a
feat: add exllamav2 for GPTQ (#99)
|
1 year ago |
AlpinDale
|
74604eb252
fix: pylint complaints (#91)
|
1 year ago |
AlpinDale
|
efc6f7fbec
chore: reformats (#90)
|
1 year ago |
AlpinDale
|
a6a4220fa6
feat: refactor megatron and quants (#57)
|
1 year ago |