Commit History

Author SHA1 Message Date
  AlpinDale da6765c084 feat: lora support for commandr models 7 months ago
  AlpinDale 70ec3a7b93 chore: make the dockerfile a bit better 7 months ago
  AlpinDale 9b4c72a801 feat: support channel-wise quant for w8a8 dynamic per token activation quant 7 months ago
  AlpinDale 79b1c0b861 fix: do not error our if two processes do not agree on p2p capability 7 months ago
  AlpinDale e6d70101b3 feat: add support for phi-3 vision model 7 months ago
  AlpinDale 313e6e1ec7 feat: add typical acceptance sampling 7 months ago
  AlpinDale 0613d91551 fix: kv head calculation with MPT GQA 7 months ago
  AlpinDale b5694be865 chore: use a pool to reuse LogicalTokenBlock.token_ids 7 months ago
  AlpinDale c05a45f22f chore: minor updates to throughput benchmark and llm class 7 months ago
  AlpinDale dfa59bc5f9 fix: 16 GPUs in a cluster 7 months ago
  AlpinDale 5a925923e3 fix: numba cache 7 months ago
  AlpinDale 964aa08a70 fix: serializer log 7 months ago
  AlpinDale 5aa910a022 chore: allow building on non-avx512 machines 7 months ago
  AlpinDale 6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571) 7 months ago
  AlpinDale e2dbe5f05c feat: add sparse marlin for compressed tensors 7 months ago
  AlpinDale e2e64a6241 fix: limit numpy version 7 months ago
  AlpinDale e4407bbcb7 fix: do not start a ray cluster when not using ray 7 months ago
  AlpinDale ee174ea4fd fix: guard for lora + chunked prefill 7 months ago
  AlpinDale f9a10145d1 fix: v2 block manager + prefix caching 7 months ago
  AlpinDale 44331a4d00 chore: improve p2p cache generation 7 months ago
  AlpinDale c8c6de64cd fix: typo in pallas backend 7 months ago
  AlpinDale cc3486477e fix: benign multiprocessing error 7 months ago
  AlpinDale c482c09a3a fix: remove duplicated input processing in async engine 7 months ago
  AlpinDale d0afe0cd21 fix: suppress mma.sp warning on CUDA 12.5 and above 7 months ago
  AlpinDale a33aaf3b42 chore: cleanup compressed tensors 7 months ago
  AlpinDale 94f4e278ff fix: illegal mem access for cutlass fp8 kernels 7 months ago
  AlpinDale 8c32e49029 feat: add avx2 cpu support 7 months ago
  AlpinDale a89c9a0e92 fix: device ordinal issues with world_size and stuff 7 months ago
  AlpinDale ab7f4ed6e5 chore: revert commit for removing unnecessary copies in flash attn backend 7 months ago
  AlpinDale 06ed127441 fix: do not raise optimization warning for fp8 quant 7 months ago