Commit History

Author SHA1 Message Date
  AlpinDale 7e54c3916d chore: factor out epilogues from cutlass kernels 7 months ago
  AlpinDale 156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569) 7 months ago
  AlpinDale 6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer 7 months ago
  AlpinDale ab5ffb228c fp8: `act_scale` -> `input_scale` 7 months ago
  AlpinDale e9c0a248dc fix: support check for fp8 cutlass 7 months ago
  AlpinDale 40bc98b363 chore: use cutlass kernels for fp8 if supported 7 months ago
  AlpinDale 39b36efabf fix: mixtral fp8 ckpt loading 7 months ago
  AlpinDale 656459fd84 make fp8_e4m3 work on nvidia 7 months ago
  AlpinDale c4c153863e improve fp8 linear layer performance 8 months ago
  AlpinDale 7d23892501 static and dynamic fp8 8 months ago
  AlpinDale 36660b55c2 chore: mixtral fp8 w/ static scales (#542) 8 months ago
  AlpinDale b178ae4b4a chore: generalize linear_method to be quant_method (#540) 8 months ago
  AlpinDale 46159b107a formatting: pt1 8 months ago
  AlpinDale fca911ee0a vLLM Upstream Sync (#526) 8 months ago