AlpinDale
|
87694c8aba
feat: add RPC server and client via ZMQ (#615)
|
4 mesiacov pred |
AlpinDale
|
6124140a45
fix: remove error_on_invalid_device_count_status
|
4 mesiacov pred |
AlpinDale
|
5cb760162c
feat: allow loading specific layer numbers per device
|
4 mesiacov pred |
AlpinDale
|
705e50f4bd
fix: broadcasting logic for multi_modal_kwargs
|
4 mesiacov pred |
AlpinDale
|
6157acf775
feat: add support for head_size of 120
|
4 mesiacov pred |
AlpinDale
|
42c66d5b00
feat: tensor parallelism for CPU backend
|
4 mesiacov pred |
AlpinDale
|
32bdbd1ee4
chore: add fp8 support to `reshape_and_cache_flash`
|
4 mesiacov pred |
AlpinDale
|
e25024da4f
chore: move some verbose logs to debug
|
5 mesiacov pred |
AlpinDale
|
51ea8ad376
chore: modularize prepare input and attn metadata builder
|
5 mesiacov pred |
AlpinDale
|
6ac658b0d6
some small performance improvements
|
5 mesiacov pred |
AlpinDale
|
b7a2d52e47
fix: allow using mp executor for pipeline parallel
|
5 mesiacov pred |
AlpinDale
|
cf381a0c54
OpenAI API Refactor (#591)
|
5 mesiacov pred |
AlpinDale
|
ddb28a80a3
fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm
|
5 mesiacov pred |
AlpinDale
|
a5fafaa9ce
chore: add more tuning for the CPU backend via intel-openmp
|
5 mesiacov pred |
AlpinDale
|
5257ebce8c
fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED
|
5 mesiacov pred |
AlpinDale
|
cda0e93a10
abstract away the platform for device capability
|
5 mesiacov pred |
AlpinDale
|
7d79c0e726
chore: use nvml query to avoid accidental cuda initialization
|
5 mesiacov pred |
AlpinDale
|
0886c361f4
feat: OpenVINO CPU backend (#576)
|
5 mesiacov pred |
AlpinDale
|
2c321ce1f2
chore: upgrade to rocm 6.1, update docker
|
5 mesiacov pred |
AlpinDale
|
25feb1d592
chore: add support for pinning lora adapters in the lru cache
|
5 mesiacov pred |
AlpinDale
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
5 mesiacov pred |
AlpinDale
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
5 mesiacov pred |
AlpinDale
|
fe21123a1c
feat: TPU support (#570)
|
5 mesiacov pred |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
5 mesiacov pred |
AlpinDale
|
b029a544ff
optimize eager mode host time with numpy
|
5 mesiacov pred |
AlpinDale
|
f2b7a42c4e
fix: async cancels in merge_async_iterators for python>=3.9
|
5 mesiacov pred |
AlpinDale
|
7194047318
remove vllm-nccl
|
5 mesiacov pred |
AlpinDale
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
5 mesiacov pred |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
6 mesiacov pred |
AlpinDale
|
251568470e
initial nvidia fp8 e4m3 for kv cache
|
6 mesiacov pred |