Historique des commits

Auteur SHA1 Message Date
  AlpinDale 814c1ddeba feat: add CustomOp interface for device portability il y a 7 mois
  AlpinDale f91f217bf8 fix: do not skip `prompt_logprobs` when `SamplingParams.detokenize=True` il y a 7 mois
  AlpinDale 5b5e6dc359 chore: add batch size 1536 and 3072 to moe benchmark il y a 7 mois
  AlpinDale a7fb48acdf fix: setuptools version in dockerfile for cpu il y a 7 mois
  AlpinDale 05d6e43244 fix: `torch.compile()` with mp executor backend il y a 7 mois
  AlpinDale 4bdd2f9892 chore: enhance MoE benchmarking il y a 7 mois
  AlpinDale e321d80e4e fix: `prompt_logprobs==0` case il y a 7 mois
  AlpinDale 141c602c39 feat: OpenAI `tools` support named functions il y a 7 mois
  AlpinDale 237fa59aea feat: support CPU/GPU swapping in BlockManagerV2 il y a 7 mois
  AlpinDale ba02fb65c9 fix: pos encodings for CPU il y a 7 mois
  AlpinDale 90bafca8e3 fix: cuda graphs with sparseml quants il y a 7 mois
  AlpinDale 89ee54dcff update dockerfile and enhance serving benchmark il y a 7 mois
  AlpinDale 75f97bc25d bump flash-attn to remove unnecessary copies in the backend il y a 7 mois
  AlpinDale 7e1d2c9feb fix: add images/ to gitignore il y a 7 mois
  AlpinDale 8d77c69cbd feat: support image processor and add llava example il y a 7 mois
  AlpinDale 00acf371f9 rocm: fused topk softmax il y a 7 mois
  AlpinDale 78de98463b feat: return max_model_len in /v1/models il y a 7 mois
  AlpinDale 8c61fb9c19 fix: prevent LLM.encode() to be used with causal models il y a 7 mois
  AlpinDale 5fecc6b025 when was this deprecated? il y a 7 mois
  AlpinDale 690110a051 feat: bitsandbytes quantization il y a 7 mois
  AlpinDale 0307da9e15 refactor: bitsandbytes -> autoquant il y a 7 mois
  AlpinDale f2c6791527 feat: update cutlass fp8 configs il y a 7 mois
  AlpinDale 54f4f1e7f3 allow the cutlass kernels to take scales that reside on the GPU il y a 7 mois
  AlpinDale 52474b8fa9 build: parallelize all build extensions il y a 7 mois
  AlpinDale 67084aca5b do not build cutlass kernels if cuda version is too low il y a 7 mois
  AlpinDale b029a544ff optimize eager mode host time with numpy il y a 7 mois
  AlpinDale ced1b36b8b feat: support head size of 192 il y a 7 mois
  AlpinDale 4ab4c5c87c oops il y a 7 mois
  AlpinDale 9e79a15b9f fix: ignore warnings for sparseml il y a 7 mois
  AlpinDale d45c846c8c do not build sm_90a for cuda 11 il y a 7 mois