.. |
attention
|
0d3562a7f9
MQA in triton FA
|
há 8 meses atrás |
common
|
f22b700ee4
feat: marlin kernels for GPTQ (#547)
|
há 7 meses atrás |
distributed
|
ac5b4b6aa7
broadcast metadata through cpu
|
há 7 meses atrás |
endpoints
|
46159b107a
formatting: pt1
|
há 8 meses atrás |
engine
|
b1555eb208
add new grafana metrics
|
há 7 meses atrás |
executor
|
fb982981ce
num_lookahead_slots in neuron and ray executors
|
há 7 meses atrás |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
há 1 ano atrás |
lora
|
e87c32bed3
feat: full tensor parallel for LoRA layers (#545)
|
há 7 meses atrás |
modeling
|
639e48e47d
fix: mistral nemo
|
há 7 meses atrás |
processing
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
há 8 meses atrás |
quantization
|
ac5b4b6aa7
broadcast metadata through cpu
|
há 7 meses atrás |
spec_decode
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
há 10 meses atrás |
task_handler
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
há 8 meses atrás |
transformers_utils
|
3bbfd65549
feat: support hub model ID when offline
|
há 7 meses atrás |
__init__.py
|
199e776722
chore: move ray utils to executor dir
|
há 8 meses atrás |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
há 1 ano atrás |