AlpinDale
|
7e54c3916d
chore: factor out epilogues from cutlass kernels
|
7 months ago |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
7 months ago |
AlpinDale
|
6cecbbff6a
fix: reduce memory footprint of cuda graph by adding output buffer
|
7 months ago |
AlpinDale
|
ab5ffb228c
fp8: `act_scale` -> `input_scale`
|
7 months ago |
AlpinDale
|
e9c0a248dc
fix: support check for fp8 cutlass
|
7 months ago |
AlpinDale
|
40bc98b363
chore: use cutlass kernels for fp8 if supported
|
7 months ago |
AlpinDale
|
39b36efabf
fix: mixtral fp8 ckpt loading
|
7 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
7 months ago |
AlpinDale
|
c4c153863e
improve fp8 linear layer performance
|
8 months ago |
AlpinDale
|
7d23892501
static and dynamic fp8
|
8 months ago |
AlpinDale
|
36660b55c2
chore: mixtral fp8 w/ static scales (#542)
|
8 months ago |
AlpinDale
|
b178ae4b4a
chore: generalize linear_method to be quant_method (#540)
|
8 months ago |
AlpinDale
|
46159b107a
formatting: pt1
|
8 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
8 months ago |