david/aphrodite-engine

Author	SHA1 Message	Date
AlpinDale	7e54c3916d chore: factor out epilogues from cutlass kernels	7 months ago
AlpinDale	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	7 months ago
AlpinDale	6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer	7 months ago
AlpinDale	ab5ffb228c fp8: `act_scale` -> `input_scale`	7 months ago
AlpinDale	e9c0a248dc fix: support check for fp8 cutlass	7 months ago
AlpinDale	40bc98b363 chore: use cutlass kernels for fp8 if supported	7 months ago
AlpinDale	39b36efabf fix: mixtral fp8 ckpt loading	7 months ago
AlpinDale	656459fd84 make fp8_e4m3 work on nvidia	7 months ago
AlpinDale	c4c153863e improve fp8 linear layer performance	8 months ago
AlpinDale	7d23892501 static and dynamic fp8	8 months ago
AlpinDale	36660b55c2 chore: mixtral fp8 w/ static scales (#542)	8 months ago
AlpinDale	b178ae4b4a chore: generalize linear_method to be quant_method (#540)	8 months ago
AlpinDale	46159b107a formatting: pt1	8 months ago
AlpinDale	fca911ee0a vLLM Upstream Sync (#526)	8 months ago