.. |
attention
|
ca6b69966d
fix: explicitly end_forward() calls to flashinfer
|
7 months ago |
common
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
distributed
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
endpoints
|
63b735bc2a
chore: optimize v2 block manager to match the performance of v1
|
6 months ago |
engine
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
executor
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
inputs
|
3a0fdf7b9b
chore: remove `image_input_type` from VLM config
|
6 months ago |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 year ago |
lora
|
0f4a9ee77b
quantized lm_head (#582)
|
6 months ago |
modeling
|
0f4a9ee77b
quantized lm_head (#582)
|
6 months ago |
multimodal
|
dd378ea063
feat: MLPSpeculator with tensor parallel
|
6 months ago |
processing
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
quantization
|
0f4a9ee77b
quantized lm_head (#582)
|
6 months ago |
spec_decode
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
task_handler
|
ae04f57ec1
feat: Pipeline Parallel support (#581)
|
6 months ago |
transformers_utils
|
3a0fdf7b9b
chore: remove `image_input_type` from VLM config
|
6 months ago |
__init__.py
|
a07fc83bc8
chore: proper util for aphrodite version
|
7 months ago |
_custom_ops.py
|
c0c336aaa3
refactor: registry for processing model inputs; quick_gelu; clip model support
|
7 months ago |
_ipex_ops.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 year ago |
version.py
|
7e54c3916d
chore: factor out epilogues from cutlass kernels
|
7 months ago |