Commit Verlauf

Autor SHA1 Nachricht Datum
  AlpinDale 87694c8aba feat: add RPC server and client via ZMQ (#615) vor 5 Monaten
  AlpinDale 6124140a45 fix: remove error_on_invalid_device_count_status vor 5 Monaten
  AlpinDale 5cb760162c feat: allow loading specific layer numbers per device vor 5 Monaten
  AlpinDale 705e50f4bd fix: broadcasting logic for multi_modal_kwargs vor 5 Monaten
  AlpinDale 6157acf775 feat: add support for head_size of 120 vor 5 Monaten
  AlpinDale 42c66d5b00 feat: tensor parallelism for CPU backend vor 5 Monaten
  AlpinDale 32bdbd1ee4 chore: add fp8 support to `reshape_and_cache_flash` vor 5 Monaten
  AlpinDale e25024da4f chore: move some verbose logs to debug vor 5 Monaten
  AlpinDale 51ea8ad376 chore: modularize prepare input and attn metadata builder vor 5 Monaten
  AlpinDale 6ac658b0d6 some small performance improvements vor 5 Monaten
  AlpinDale b7a2d52e47 fix: allow using mp executor for pipeline parallel vor 5 Monaten
  AlpinDale cf381a0c54 OpenAI API Refactor (#591) vor 5 Monaten
  AlpinDale ddb28a80a3 fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm vor 5 Monaten
  AlpinDale a5fafaa9ce chore: add more tuning for the CPU backend via intel-openmp vor 5 Monaten
  AlpinDale 5257ebce8c fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED vor 5 Monaten
  AlpinDale cda0e93a10 abstract away the platform for device capability vor 5 Monaten
  AlpinDale 7d79c0e726 chore: use nvml query to avoid accidental cuda initialization vor 5 Monaten
  AlpinDale 0886c361f4 feat: OpenVINO CPU backend (#576) vor 5 Monaten
  AlpinDale 2c321ce1f2 chore: upgrade to rocm 6.1, update docker vor 5 Monaten
  AlpinDale 25feb1d592 chore: add support for pinning lora adapters in the lru cache vor 5 Monaten
  AlpinDale 6a57861fca feat: initial XPU support via intel_extension_for_pytorch (#571) vor 5 Monaten
  AlpinDale a89c9a0e92 fix: device ordinal issues with world_size and stuff vor 6 Monaten
  AlpinDale fe21123a1c feat: TPU support (#570) vor 6 Monaten
  AlpinDale 156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569) vor 6 Monaten
  AlpinDale b029a544ff optimize eager mode host time with numpy vor 6 Monaten
  AlpinDale f2b7a42c4e fix: async cancels in merge_async_iterators for python>=3.9 vor 6 Monaten
  AlpinDale 7194047318 remove vllm-nccl vor 6 Monaten
  AlpinDale 90ceab32ff refactor: consolidate prompt args to LLM engines vor 6 Monaten
  AlpinDale 656459fd84 make fp8_e4m3 work on nvidia vor 6 Monaten
  AlpinDale 251568470e initial nvidia fp8 e4m3 for kv cache vor 6 Monaten