david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 62b2c4119d2aa9e2bb5406d99bbdf9a4ff04276b

AlpinDale 62b2c4119d feat: re-write GPTQ and refactor exllama kernels (#152)		1 year ago
..
quantization	62b2c4119d feat: re-write GPTQ and refactor exllama kernels (#152)	1 year ago
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 year ago
activation.py	5dbd5f8c30 fix: quant TP (#129)	1 year ago
attention.py	653da510d1 chore: rewrite InputMetadata (#143)	1 year ago
layernorm.py	1aab8a7d6f feat: speedup compilation times by 3x (#130)	1 year ago
linear.py	62b2c4119d feat: re-write GPTQ and refactor exllama kernels (#152)	1 year ago
rotary_embedding.py	e386032ae8 fix: rope duplication (#142)	1 year ago
sampler.py	653da510d1 chore: rewrite InputMetadata (#143)	1 year ago
sampler_mirostat.py	653da510d1 chore: rewrite InputMetadata (#143)	1 year ago
vocab_parallel_embedding.py	e7b6a2d5a0 chore: tensor parallel refactors part 2 (#116)	1 year ago