david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ chore/test-updates

sgsdxzy fcfb72af24 Support arbitrary model in GGUF. (#381)		hai 8 meses
..
attention	1270b5567e triton compile error for flash_attn	hai 8 meses
common	fcfb72af24 Support arbitrary model in GGUF. (#381)	hai 8 meses
distributed	b1caee23a6 cache the p2p access check for memory saving	hai 8 meses
endpoints	b1caee23a6 cache the p2p access check for memory saving	hai 8 meses
engine	bd0ddf1cfe feat: EETQ quantization (#408)	hai 8 meses
executor	373e0d3c01 fix neuron	hai 8 meses
kv_quant	e42a78381a feat: switch from pylint to ruff (#322)	hai 10 meses
lora	fe17712f29 fully working chunked prefill	hai 8 meses
modeling	fcfb72af24 Support arbitrary model in GGUF. (#381)	hai 8 meses
processing	fe17712f29 fully working chunked prefill	hai 8 meses
spec_decode	4d33ce60da feat: Triton flash attention backend for ROCm (#407)	hai 8 meses
task_handler	6e0761ba5d make init_distributed_environment compatible with init_process_group	hai 8 meses
transformers_utils	c18bf116da fix stop strings not being excluded from outputs	hai 8 meses
__init__.py	c2aaaefd57 allow out-of-tree model registry	hai 9 meses
py.typed	1c988a48b2 fix logging and add py.typed	hai 1 ano