AlpinDale 71a26f0998 chore: use pytorch sdpa backend to do naive attention for rocm hai 7 meses
..
attention 71a26f0998 chore: use pytorch sdpa backend to do naive attention for rocm hai 7 meses
common 76d6f49bbb fix: modelscope downloads hai 7 meses
distributed b2fd915c35 improve p2p access check hai 7 meses
endpoints 1d7f5c45b0 feat: add stream_options for chat completions hai 7 meses
engine d7ebffe2f0 chore: re-add the graceful engine shutdown hai 7 meses
executor 17eb1b7eb9 chore: remove ray health check hai 7 meses
kv_quant e42a78381a feat: switch from pylint to ruff (#322) hai 1 ano
lora c975bba905 fix: sharded state loader with lora hai 7 meses
modeling b2cb5a92e9 fix: missing cache_config for dbrx hai 7 meses
multimodal f2e94e2184 chore: minor llava cleanups in preparation for llava-next hai 7 meses
processing 3f92035bf1 fix: add `ignored_seq_groups` in `_schedule_chunked_prefill` hai 7 meses
quantization e9c0a248dc fix: support check for fp8 cutlass hai 7 meses
spec_decode ec5b99d075 fix: use named args hai 7 meses
task_handler c975bba905 fix: sharded state loader with lora hai 7 meses
transformers_utils 76d6f49bbb fix: modelscope downloads hai 7 meses
__init__.py be8154a8a0 feat: proper embeddings API with e5-mistral-7b support hai 7 meses
py.typed 1c988a48b2 fix logging and add py.typed hai 1 ano