Histórico de Commits

Autor SHA1 Mensagem Data
  AlpinDale cda0e93a10 abstract away the platform for device capability há 7 meses atrás
  AlpinDale cf472315cc refactor: isolate FP8 from mixtral há 7 meses atrás
  AlpinDale 7d79c0e726 chore: use nvml query to avoid accidental cuda initialization há 7 meses atrás
  AlpinDale ddb3323f94 refactor: have w8a8 compressed tensors use `process_weights_after_load` for fp8 há 7 meses atrás
  AlpinDale 272c64ab88 chore: allow loading fp8 models with fused qkv/mlp há 7 meses atrás
  AlpinDale cd9ed8623b fix: cuda version check for fp8 support in the cutlass kernels há 7 meses atrás
  AlpinDale 7e54c3916d chore: factor out epilogues from cutlass kernels há 7 meses atrás
  AlpinDale 156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569) há 7 meses atrás
  AlpinDale 6cecbbff6a fix: reduce memory footprint of cuda graph by adding output buffer há 7 meses atrás
  AlpinDale ab5ffb228c fp8: `act_scale` -> `input_scale` há 7 meses atrás
  AlpinDale e9c0a248dc fix: support check for fp8 cutlass há 7 meses atrás
  AlpinDale 40bc98b363 chore: use cutlass kernels for fp8 if supported há 7 meses atrás
  AlpinDale 39b36efabf fix: mixtral fp8 ckpt loading há 7 meses atrás
  AlpinDale 656459fd84 make fp8_e4m3 work on nvidia há 8 meses atrás
  AlpinDale c4c153863e improve fp8 linear layer performance há 8 meses atrás
  AlpinDale 7d23892501 static and dynamic fp8 há 8 meses atrás
  AlpinDale 36660b55c2 chore: mixtral fp8 w/ static scales (#542) há 8 meses atrás
  AlpinDale b178ae4b4a chore: generalize linear_method to be quant_method (#540) há 8 meses atrás
  AlpinDale 46159b107a formatting: pt1 há 8 meses atrás
  AlpinDale fca911ee0a vLLM Upstream Sync (#526) há 8 meses atrás