AlpinDale
|
6ac658b0d6
some small performance improvements
|
5 months ago |
AlpinDale
|
b7a2d52e47
fix: allow using mp executor for pipeline parallel
|
5 months ago |
AlpinDale
|
cf381a0c54
OpenAI API Refactor (#591)
|
5 months ago |
AlpinDale
|
ddb28a80a3
fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm
|
5 months ago |
AlpinDale
|
a5fafaa9ce
chore: add more tuning for the CPU backend via intel-openmp
|
5 months ago |
AlpinDale
|
5257ebce8c
fix: device >= 0 && device < num_gpus INTERNAL_ASSERT FAILED
|
5 months ago |
AlpinDale
|
cda0e93a10
abstract away the platform for device capability
|
5 months ago |
AlpinDale
|
7d79c0e726
chore: use nvml query to avoid accidental cuda initialization
|
5 months ago |
AlpinDale
|
0886c361f4
feat: OpenVINO CPU backend (#576)
|
6 months ago |
AlpinDale
|
2c321ce1f2
chore: upgrade to rocm 6.1, update docker
|
6 months ago |
AlpinDale
|
25feb1d592
chore: add support for pinning lora adapters in the lru cache
|
6 months ago |
AlpinDale
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
6 months ago |
AlpinDale
|
a89c9a0e92
fix: device ordinal issues with world_size and stuff
|
6 months ago |
AlpinDale
|
fe21123a1c
feat: TPU support (#570)
|
6 months ago |
AlpinDale
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
6 months ago |
AlpinDale
|
b029a544ff
optimize eager mode host time with numpy
|
6 months ago |
AlpinDale
|
f2b7a42c4e
fix: async cancels in merge_async_iterators for python>=3.9
|
6 months ago |
AlpinDale
|
7194047318
remove vllm-nccl
|
6 months ago |
AlpinDale
|
90ceab32ff
refactor: consolidate prompt args to LLM engines
|
6 months ago |
AlpinDale
|
656459fd84
make fp8_e4m3 work on nvidia
|
6 months ago |
AlpinDale
|
251568470e
initial nvidia fp8 e4m3 for kv cache
|
6 months ago |
AlpinDale
|
4476d2d885
remove cuda version check
|
6 months ago |
AlpinDale
|
2351a0e2cd
feat: FlashInfer backend for decoding phase (#548)
|
6 months ago |
AlpinDale
|
2656df543b
why was this removed? weird
|
6 months ago |
AlpinDale
|
2e0b115ce1
move func tracing to utils
|
7 months ago |
AlpinDale
|
46159b107a
formatting: pt1
|
7 months ago |
AlpinDale
|
fca911ee0a
vLLM Upstream Sync (#526)
|
7 months ago |
AlpinDale
|
f894f7b176
Revert "reduce dedupe by wrapping in general worker class"
|
8 months ago |
AlpinDale
|
9fff6fb507
reduce dedupe by wrapping in general worker class
|
8 months ago |
AlpinDale
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
9 months ago |