.. |
adapter_commons
|
99680b2d23
feat: soft prompts (#589)
|
hace 6 meses |
attention
|
d8f9f0ec16
fix: prefix prefill kernels for fp32 data type
|
hace 6 meses |
common
|
bf15e1b4e8
chore: deprecation warning for beam search
|
hace 6 meses |
distributed
|
cc6399792f
fix: keep consistent with how pytorch finds libcudart.so
|
hace 6 meses |
endpoints
|
a3b56353fa
fix: another one missed
|
hace 6 meses |
engine
|
0c17c2a8a7
chore: add commit hash, clean up engine logs
|
hace 6 meses |
executor
|
23408b9b2b
chore: skip the driver worker
|
hace 6 meses |
inputs
|
4f7d212b70
feat: remove vision language config
|
hace 6 meses |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
hace 1 año |
lora
|
99680b2d23
feat: soft prompts (#589)
|
hace 6 meses |
modeling
|
e13a66925c
feat: add fuyu vision model and persimmon language model support
|
hace 6 meses |
multimodal
|
c11a8bdaad
fix: calculate max number of multi-modal tokens automatically
|
hace 6 meses |
platforms
|
1a40bf438b
fix: incorrect gpu capability when used mixed gpus
|
hace 6 meses |
processing
|
99680b2d23
feat: soft prompts (#589)
|
hace 6 meses |
prompt_adapter
|
99680b2d23
feat: soft prompts (#589)
|
hace 6 meses |
quantization
|
1efd0f89b7
feat: support FP8 for DeepSeekV2 MoE
|
hace 6 meses |
spec_decode
|
16dff9babc
chore: enable bonus token in spec decoding for KV cache based models
|
hace 6 meses |
task_handler
|
ddb28a80a3
fix: bump torch for rocm, unify CUDA_VISIBLE_DEVICES for cuda and rocm
|
hace 6 meses |
transformers_utils
|
63becc67c0
fix: prompt logprob detokenization
|
hace 6 meses |
__init__.py
|
0c17c2a8a7
chore: add commit hash, clean up engine logs
|
hace 6 meses |
_custom_ops.py
|
ad24e74a99
feat: FP8 weight-only quantization support for Ampere GPUs
|
hace 6 meses |
_ipex_ops.py
|
6a57861fca
feat: initial XPU support via intel_extension_for_pytorch (#571)
|
hace 7 meses |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
hace 1 año |
version.py
|
0c17c2a8a7
chore: add commit hash, clean up engine logs
|
hace 6 meses |