.. |
attention
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
common
|
0613d91551
fix: kv head calculation with MPT GQA
|
7 months ago |
distributed
|
79b1c0b861
fix: do not error our if two processes do not agree on p2p capability
|
7 months ago |
endpoints
|
c05a45f22f
chore: minor updates to throughput benchmark and llm class
|
7 months ago |
engine
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
executor
|
dfa59bc5f9
fix: 16 GPUs in a cluster
|
7 months ago |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 year ago |
lora
|
42d2ee0f43
chore: better error logging for unsupported lora weights
|
7 months ago |
modeling
|
da6765c084
feat: lora support for commandr models
|
7 months ago |
multimodal
|
f2e94e2184
chore: minor llava cleanups in preparation for llava-next
|
7 months ago |
processing
|
f9a10145d1
fix: v2 block manager + prefix caching
|
7 months ago |
quantization
|
9b4c72a801
feat: support channel-wise quant for w8a8 dynamic per token activation quant
|
7 months ago |
spec_decode
|
313e6e1ec7
feat: add typical acceptance sampling
|
7 months ago |
task_handler
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
transformers_utils
|
bba89fc6d3
chore: make the automatic rope scaling behave properly with rope_scaling arg, add rope theta
|
7 months ago |
__init__.py
|
a07fc83bc8
chore: proper util for aphrodite version
|
7 months ago |
_custom_ops.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
_ipex_ops.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 months ago |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 year ago |
version.py
|
7e54c3916d
chore: factor out epilogues from cutlass kernels
|
7 months ago |