AlpinDale
|
525edc1283
build: fix compilation for causal_conv1d_fwd kernel signature (#1057)
|
1 week ago |
AlpinDale
|
9bdf8d5bfa
mamba: enable continuous batching for mamba kernels (#1055)
|
1 week ago |
AlpinDale
|
239a8cae25
torch.compile: register all-reduce operations as custom ops (#1050)
|
1 week ago |
AlpinDale
|
1390915778
multi-step: add support for flashinfer attention backend (#1033)
|
1 week ago |
AlpinDale
|
93bc863591
feat: Machete Kernels for Hopper GPUs (#842)
|
1 month ago |
AlpinDale
|
bfc8988116
feat: add cuda sampling kernels for top_k and top_p (#828)
|
1 month ago |
Naomiusearch
|
eee3cf5dab
fix: make AMD usable (#775)
|
2 months ago |
AlpinDale
|
f1d0b77c92
[0.6.0] Release Candidate (#481)
|
4 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |
AlpinDale
|
e702f587cf
feat: add batched RoPE kernels (#371)
|
9 months ago |
AlpinDale
|
3d6695cfbb
feat: add approximate gelu activation kernels (#370)
|
9 months ago |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 months ago |
AlpinDale
|
c41462cfcd
feat: exllamav2 quantization (#305)
|
10 months ago |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
10 months ago |
AlpinDale
|
e0c35bb353
feat: bitsandbytes and `--load-in{4,8}bit` support (#294)
|
10 months ago |
AlpinDale
|
705821a7fe
feat: AQLM quantization support (#293)
|
10 months ago |
AlpinDale
|
72229a94da
feat: better marlin kernels (#285)
|
10 months ago |
AlpinDale
|
e31c6f0b45
feat: refactor modeling logic and support more models (#274)
|
10 months ago |
AlpinDale
|
224b87b484
feat: add fused mixtral moe support (#238)
|
10 months ago |
AlpinDale
|
d9b65e6c5f
feat: DeepSeek MoE support (#237)
|
11 months ago |
AlpinDale
|
c3a221eb02
feat: GGUF, QuIP#, and Marlin support (#228)
|
11 months ago |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
11 months ago |
AlpinDale
|
641bb0f6e9
feat: add custom allreduce kernels (#224)
|
11 months ago |
AlpinDale
|
5053743c1c
feat: speedup AWQ (#223)
|
11 months ago |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
11 months ago |
AlpinDale
|
7e72ce0a73
feat: mixtral tensor parallelism (#193)
|
1 year ago |
AlpinDale
|
15a0454172
feat: FP8 KV Cache (#185)
|
1 year ago |
AlpinDale
|
801eda0b7a
feat: support GPTQ 2, 3, and 8bit quants (#181)
|
1 year ago |
AlpinDale
|
7c6fdea535
fix: GPTQ warnings and exllama states (#171)
|
1 year ago |
AlpinDale
|
02f3ab3501
fix: replace head_mapping with num_kv_heads (#161)
|
1 year ago |