.. |
attention
|
0d3562a7f9
MQA in triton FA
|
8 ヶ月 前 |
common
|
f22b700ee4
feat: marlin kernels for GPTQ (#547)
|
7 ヶ月 前 |
distributed
|
ac5b4b6aa7
broadcast metadata through cpu
|
7 ヶ月 前 |
endpoints
|
46159b107a
formatting: pt1
|
8 ヶ月 前 |
engine
|
b1555eb208
add new grafana metrics
|
7 ヶ月 前 |
executor
|
fb982981ce
num_lookahead_slots in neuron and ray executors
|
7 ヶ月 前 |
kv_quant
|
e42a78381a
feat: switch from pylint to ruff (#322)
|
1 年間 前 |
lora
|
e87c32bed3
feat: full tensor parallel for LoRA layers (#545)
|
7 ヶ月 前 |
modeling
|
639e48e47d
fix: mistral nemo
|
7 ヶ月 前 |
processing
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
8 ヶ月 前 |
quantization
|
ac5b4b6aa7
broadcast metadata through cpu
|
7 ヶ月 前 |
spec_decode
|
9d81716bfd
[v0.5.3] Release Candidate (#388)
|
10 ヶ月 前 |
task_handler
|
aed64884c6
feat: prompt logprobs with chunked prefill (#539)
|
8 ヶ月 前 |
transformers_utils
|
3bbfd65549
feat: support hub model ID when offline
|
7 ヶ月 前 |
__init__.py
|
199e776722
chore: move ray utils to executor dir
|
8 ヶ月 前 |
py.typed
|
1c988a48b2
fix logging and add py.typed
|
1 年間 前 |