AlpinDale
|
32bdbd1ee4
chore: add fp8 support to `reshape_and_cache_flash`
|
4 月之前 |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
5 月之前 |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
5 月之前 |
AlpinDale
|
3bdeb3e116
fix: clang formatting for all kernels (#558)
|
6 月之前 |
AlpinDale
|
251568470e
initial nvidia fp8 e4m3 for kv cache
|
6 月之前 |
AlpinDale
|
8b56dc4347
dict -> torch.Tensor for blocks_to_swap
|
6 月之前 |
AlpinDale
|
21ce19b3ea
blocks_to_copy dict -> torch.Tensor
|
6 月之前 |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
6 月之前 |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 月之前 |
AlpinDale
|
f8dfac6372
chore: attention refactor and upstream sync apr01 (#365)
|
9 月之前 |
AlpinDale
|
9810daa699
feat: INT8 KV Cache (#298)
|
10 月之前 |
AlpinDale
|
31c95011a6
feat: FP8 E5M2 KV Cache (#226)
|
1 年之前 |
AlpinDale
|
8fa608aeb7
feat: replace Ray with NCCL for control plane comms (#221)
|
1 年之前 |
AlpinDale
|
15a0454172
feat: FP8 KV Cache (#185)
|
1 年之前 |
AlpinDale
|
1aab8a7d6f
feat: speedup compilation times by 3x (#130)
|
1 年之前 |