.. |
cutlass_benchmarks
|
765adcfba1
chore: add w8a8 benchmark scripts
|
há 7 meses atrás |
attention.py
|
156f577f79
feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)
|
há 7 meses atrás |
backend_request_func.py
|
89ee54dcff
update dockerfile and enhance serving benchmark
|
há 7 meses atrás |
benchmark_moe.py
|
5b5e6dc359
chore: add batch size 1536 and 3072 to moe benchmark
|
há 7 meses atrás |
hashing.py
|
c6a501f682
add multiprocessing executor; make ray optional
|
há 7 meses atrás |
latency.py
|
e1f3fd1e02
fix: test units (#201)
|
há 1 ano atrás |
launch_tgi.sh
|
4d04ade9ef
feat: fine-grained seeds (#279)
|
há 1 ano atrás |
serving.py
|
89ee54dcff
update dockerfile and enhance serving benchmark
|
há 7 meses atrás |
sonnet.txt
|
89ee54dcff
update dockerfile and enhance serving benchmark
|
há 7 meses atrás |
throughput.py
|
abbb730607
feat: support draft model on different tensor parallel size
|
há 7 meses atrás |