david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 4ed1bb99584e715127c2f01a16255437969c35dd

AlpinDale abbb730607 feat: support draft model on different tensor parallel size		há 7 meses atrás
..
cutlass_benchmarks	765adcfba1 chore: add w8a8 benchmark scripts	há 7 meses atrás
attention.py	156f577f79 feat: switch from `PYBIND11_MODULE` to `TORCH_LIBRARY` (#569)	há 7 meses atrás
backend_request_func.py	89ee54dcff update dockerfile and enhance serving benchmark	há 7 meses atrás
benchmark_moe.py	5b5e6dc359 chore: add batch size 1536 and 3072 to moe benchmark	há 7 meses atrás
hashing.py	c6a501f682 add multiprocessing executor; make ray optional	há 7 meses atrás
latency.py	e1f3fd1e02 fix: test units (#201)	há 1 ano atrás
launch_tgi.sh	4d04ade9ef feat: fine-grained seeds (#279)	há 1 ano atrás
serving.py	89ee54dcff update dockerfile and enhance serving benchmark	há 7 meses atrás
sonnet.txt	89ee54dcff update dockerfile and enhance serving benchmark	há 7 meses atrás
throughput.py	abbb730607 feat: support draft model on different tensor parallel size	há 7 meses atrás