.. |
attention
|
93cffaf446
add flash_attn back
|
il y a 7 mois |
common
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
il y a 7 mois |
distributed
|
c58589318f
remove the graph mode func
|
il y a 7 mois |
endpoints
|
fe431bb840
check for next port if current is unavailable
|
il y a 7 mois |
engine
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
il y a 7 mois |
executor
|
eaa06fdd14
fix some f-strings
|
il y a 7 mois |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
il y a 1 an |
lora
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
il y a 7 mois |
modeling
|
f970f3f3fb
add base class for VLMs
|
il y a 7 mois |
processing
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
il y a 7 mois |
quantization
|
8e11259e90
missing triton autoconfig for rocm flash attn
|
il y a 7 mois |
spec_decode
|
236be273e5
feat: tensor parallel speculative decoding (#554)
|
il y a 7 mois |
task_handler
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
il y a 7 mois |
transformers_utils
|
9e73559eba
make use of batched rotary embedding kernels to support long context lora
|
il y a 7 mois |
__init__.py
|
be8154a8a0
feat: proper embeddings API with e5-mistral-7b support
|
il y a 7 mois |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
il y a 1 an |