david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 801eda0b7a3c3c44ca165c95b461800a31bb770c

AlpinDale 801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181)		1 year ago
..
layers	801eda0b7a feat: support GPTQ 2, 3, and 8bit quants (#181)	1 year ago
megatron	e7b6a2d5a0 chore: tensor parallel refactors part 2 (#116)	1 year ago
models	b9b295d74e chore: backlogs 1 (#191)	1 year ago
__init__.py	653da510d1 chore: rewrite InputMetadata (#143)	1 year ago
hf_downloader.py	725be3e0de feat: mixtral HF with expert parallelism (#167)	1 year ago
loader.py	730357c7d5 chore: implement lazy module loader for models (#165)	1 year ago
metadata.py	7d91e9e0f2 feat: CUDA graphs (#172)	1 year ago
sampling_metadata.py	2aab3da9bd chore: fix Python 3.8+ compatibility (#170)	1 year ago
utils.py	e7b6a2d5a0 chore: tensor parallel refactors part 2 (#116)	1 year ago