.. |
attention
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 bulan lalu |
common
|
0613d91551
fix: kv head calculation with MPT GQA
|
7 bulan lalu |
distributed
|
017b42c517
chore: use fork as the default method for mp backend
|
7 bulan lalu |
endpoints
|
c05a45f22f
chore: minor updates to throughput benchmark and llm class
|
7 bulan lalu |
engine
|
3c7444c89b
fix: asyncio.run hangs in python < 3.12
|
7 bulan lalu |
executor
|
017b42c517
chore: use fork as the default method for mp backend
|
7 bulan lalu |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 tahun lalu |
lora
|
42d2ee0f43
chore: better error logging for unsupported lora weights
|
7 bulan lalu |
modeling
|
025322ee5f
fix: fp8 kv cache for qwen2 models
|
7 bulan lalu |
multimodal
|
f2e94e2184
chore: minor llava cleanups in preparation for llava-next
|
7 bulan lalu |
processing
|
f9a10145d1
fix: v2 block manager + prefix caching
|
7 bulan lalu |
quantization
|
cd9ed8623b
fix: cuda version check for fp8 support in the cutlass kernels
|
7 bulan lalu |
spec_decode
|
313e6e1ec7
feat: add typical acceptance sampling
|
7 bulan lalu |
task_handler
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 bulan lalu |
transformers_utils
|
bba89fc6d3
chore: make the automatic rope scaling behave properly with rope_scaling arg, add rope theta
|
7 bulan lalu |
__init__.py
|
a07fc83bc8
chore: proper util for aphrodite version
|
7 bulan lalu |
_custom_ops.py
|
cd9ed8623b
fix: cuda version check for fp8 support in the cutlass kernels
|
7 bulan lalu |
_ipex_ops.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
7 bulan lalu |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 tahun lalu |
version.py
|
7e54c3916d
chore: factor out epilogues from cutlass kernels
|
7 bulan lalu |