david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 3a0fdf7b9b85d83a6723865aaba5326620cf0f36

AlpinDale 3a0fdf7b9b chore: remove `image_input_type` from VLM config		hace 7 meses
..
__init__.py	bbde979ecd DeepSeek-V2 (#579)	hace 7 meses
arctic.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
baichuan.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
bloom.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
chatglm.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
clip.py	3a0fdf7b9b chore: remove `image_input_type` from VLM config	hace 7 meses
commandr.py	da6765c084 feat: lora support for commandr models	hace 7 meses
dbrx.py	b2cb5a92e9 fix: missing cache_config for dbrx	hace 7 meses
decilm.py	56e0b8223c chore: add base class for LoRA-supported models	hace 7 meses
deepseek.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
deepseek_v2.py	bbde979ecd DeepSeek-V2 (#579)	hace 7 meses
falcon.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
gemma.py	b6ff0623a6 chore: clean up branding	hace 7 meses
gpt2.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
gpt_bigcode.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
gpt_j.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
gpt_neox.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
interfaces.py	85ef2fe8b1 chore: clean up placeholder symbols	hace 7 meses
internlm2.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
jais.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
llama.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
llama_embedding.py	50b7c13db0 refactor: attention selector (#552)	hace 7 meses
llava.py	3a0fdf7b9b chore: remove `image_input_type` from VLM config	hace 7 meses
llava_next.py	3a0fdf7b9b chore: remove `image_input_type` from VLM config	hace 7 meses
minicpm.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
mixtral.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
mixtral_quant.py	b6ff0623a6 chore: clean up branding	hace 7 meses
mlp_speculator.py	de7e6919c0 feat: support tied weights and input scale for MLPSpeculator	hace 7 meses
mpt.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
olmo.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
opt.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
orion.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
phi.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
phi3_small.py	696f2cd59c add phi3_small support with blocksparse attention	hace 7 meses
phi3v.py	3a0fdf7b9b chore: remove `image_input_type` from VLM config	hace 7 meses
qwen.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
qwen2.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses
qwen2_moe.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
stablelm.py	656459fd84 make fp8_e4m3 work on nvidia	hace 7 meses
starcoder2.py	ac79d115b3 add guards for prefix caching, fp8, chunked, etc	hace 7 meses
xverse.py	c5d8028668 fix: no need to redefine supports_vision and supports_lora in model class	hace 7 meses