AlpinDale
|
9be43994fe
feat: fbgemm quantization support (#601)
|
5 months ago |
AlpinDale
|
9d7beaa5b9
chore: separate kv_scale into k_scale and v_scale
|
5 months ago |
AlpinDale
|
2105e4fd6b
feat: correctly invoke prefill & decode kernels for cross-attention
|
5 months ago |
AlpinDale
|
7e66e8f899
fix: only add `Attention.kv_scale` if kv cache quant is enabled
|
5 months ago |
AlpinDale
|
ac79d115b3
add guards for prefix caching, fp8, chunked, etc
|
6 months ago |
AlpinDale
|
696f2cd59c
add phi3_small support with blocksparse attention
|
6 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
6 months ago |
AlpinDale
|
0c15965621
fix fp8 kv
|
6 months ago |
AlpinDale
|
a94de94c44
refactor: combine the prefill and decode into a single API (#553)
|
6 months ago |
AlpinDale
|
50b7c13db0
refactor: attention selector (#552)
|
6 months ago |
AlpinDale
|
6fc1ec6e9a
fix redirects and improve low level debugging
|
6 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
8 months ago |