AlpinDale
|
dd18c5042c
move prepare_inputs to the GPU (#596)
|
5 months ago |
AlpinDale
|
f5d52320da
Port mamba kernels to Aphrodite (#595)
|
5 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
5 months ago |
AlpinDale
|
5be90c3859
Mamba infrastrucuture support (#586)
|
5 months ago |
AlpinDale
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
5 months ago |
AlpinDale
|
271a680026
feat: inference support for PowerPC ISA
|
5 months ago |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
5 months ago |
AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
5 months ago |
AlpinDale
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
6 months ago |
AlpinDale
|
35ae01d7ba
refactor: attention metadata term
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
e702f587cf
feat: add batched RoPE kernels (#371)
|
9 months ago |
AlpinDale
|
3d6695cfbb
feat: add approximate gelu activation kernels (#370)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
AlpinDale
|
c41462cfcd
feat: exllamav2 quantization (#305)
|
10 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
10 months ago |
AlpinDale
|
e0c35bb353
feat: bitsandbytes and `--load-in{4,8}bit` support (#294)
|
10 months ago |
AlpinDale
|
705821a7fe
feat: AQLM quantization support (#293)
|
10 months ago |
AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
10 months ago |
AlpinDale
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
11 months ago |
AlpinDale
|
224b87b484
feat: add fused mixtral moe support (#238)
|
11 months ago |
AlpinDale
|
d9b65e6c5f
feat: DeepSeek MoE support (#237)
|
11 months ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
1 year ago |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
1 year ago |
AlpinDale
|
641bb0f6e9
feat: add custom allreduce kernels (#224)
|
1 year ago |
AlpinDale
|
5053743c1c
feat: speedup AWQ (#223)
|
1 year ago |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
1 year ago |
AlpinDale
|
7e72ce0a73
feat: mixtral tensor parallelism (#193)
|
1 year ago |
AlpinDale
|
15a0454172
feat: FP8 KV Cache (#185)
|
1 year ago |
AlpinDale
|
801eda0b7a
feat: support GPTQ 2, 3, and 8bit quants (#181)
|
1 year ago |