Commit Verlauf

Autor SHA1 Nachricht Datum
  AlpinDale 017b42c517 chore: use fork as the default method for mp backend vor 7 Monaten
  AlpinDale cd9ed8623b fix: cuda version check for fp8 support in the cutlass kernels vor 7 Monaten
  AlpinDale fad77538de feat: update cutlass int8 kernel configs for sm90 vor 7 Monaten
  AlpinDale b753ff7870 feat: per-channel support for static activation quant vor 7 Monaten
  AlpinDale 3c7444c89b fix: asyncio.run hangs in python < 3.12 vor 7 Monaten
  AlpinDale d44ac8e497 fix: `--preemption_mode` -> `--preemption-mode` vor 7 Monaten
  AlpinDale bcf9c83e6a fix: incorrect args passed to generate() method in phi3v example vor 7 Monaten
  AlpinDale 025322ee5f fix: fp8 kv cache for qwen2 models vor 7 Monaten
  AlpinDale 323fe23b21 chore: use 127.0.0.1 for single-node setups vor 7 Monaten
  AlpinDale 89be49d058 fix: build for mi300x vor 7 Monaten
  AlpinDale 7d3da17e19 fix: phi3 rope scaling vor 7 Monaten
  AlpinDale 765adcfba1 chore: add w8a8 benchmark scripts vor 7 Monaten
  AlpinDale 1587fab5de fix: cuda version check for mma warning suppression vor 7 Monaten
  AlpinDale d1f91d0f70 fix: greedy sampling being not greedy in concurrent situations where penalties are used vor 7 Monaten
  AlpinDale da6765c084 feat: lora support for commandr models vor 7 Monaten
  AlpinDale 70ec3a7b93 chore: make the dockerfile a bit better vor 7 Monaten
  AlpinDale 9b4c72a801 feat: support channel-wise quant for w8a8 dynamic per token activation quant vor 7 Monaten
  AlpinDale 79b1c0b861 fix: do not error our if two processes do not agree on p2p capability vor 7 Monaten
  AlpinDale e6d70101b3 feat: add support for phi-3 vision model vor 7 Monaten
  AlpinDale 313e6e1ec7 feat: add typical acceptance sampling vor 7 Monaten
  AlpinDale 0613d91551 fix: kv head calculation with MPT GQA vor 7 Monaten
  AlpinDale b5694be865 chore: use a pool to reuse LogicalTokenBlock.token_ids vor 7 Monaten
  AlpinDale c05a45f22f chore: minor updates to throughput benchmark and llm class vor 7 Monaten
  AlpinDale dfa59bc5f9 fix: 16 GPUs in a cluster vor 7 Monaten
  AlpinDale 5a925923e3 fix: numba cache vor 7 Monaten
  AlpinDale 964aa08a70 fix: serializer log vor 7 Monaten
  AlpinDale 5aa910a022 chore: allow building on non-avx512 machines vor 7 Monaten
  AlpinDale 6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571) vor 7 Monaten
  AlpinDale e2dbe5f05c feat: add sparse marlin for compressed tensors vor 7 Monaten
  AlpinDale e2e64a6241 fix: limit numpy version vor 7 Monaten