david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 0859dc3bc0eae025439fc17fb267a99736581f97

AlpinDale a113309876 kernel: add meta functions for ops to prevent graph breaks (#1019)		2 months ago
..
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 year ago
block.py	7df7b8ca53 optimization: reduce end-to-end overhead from python obj allocation (#666)	6 months ago
config.py	f2b6dc3872 cpu: add support for W8A8 quantization via compressed-tensor (#1017)	2 months ago
connections.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	6 months ago
envs.py	a113309876 kernel: add meta functions for ops to prevent graph breaks (#1019)	2 months ago
grammar.py	8a71788372 Add OLMoE (#772)	4 months ago
logger.py	22a4cd4595 core: fix spec decode metrics and envs circular import (#889)	2 months ago
logits_processor.py	8a71788372 Add OLMoE (#772)	4 months ago
outputs.py	0e558e9b2f fix: loading chameleon model with TP>1 (#695)	6 months ago
pooling_params.py	2f61644f6e SPMD optimizations (#824)	3 months ago
sampling_params.py	8d9f1fd4e6 feat: add single user mode (#927)	2 months ago
sequence.py	411ac4f405 vlm: add support for Qwen2-VL model (#1015)	2 months ago
test_utils.py	8a71788372 Add OLMoE (#772)	4 months ago
utils.py	f1ea7711bd core: do not compile ScalarType for torch < 2.4.0 (#938)	2 months ago