david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention). @ 205c8e4106a9fc0cdc45102ea11b0eed80a807aa

50h100a f663d3fccc Merge pull request #397 from 50h100a/pr_samplerasserts		11 tháng trước cách đây
..
attention	78d66f16d1 Chunked Prefill Part 1 (#384)	11 tháng trước cách đây
fused_moe	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	11 tháng trước cách đây
ops	9181fa0396 feat: Triton kernels for sampling (#383)	11 tháng trước cách đây
quantization	f8652c8e99 fix: optimize aqlm dequantization (#325)	1 năm trước cách đây
__init__.py	07aa2a492f upstream: add option to specify tokenizer	1 năm trước cách đây
activation.py	3d6695cfbb feat: add approximate gelu activation kernels (#370)	11 tháng trước cách đây
layernorm.py	e31c6f0b45 feat: refactor modeling logic and support more models (#274)	1 năm trước cách đây
linear.py	e42a78381a feat: switch from pylint to ruff (#322)	1 năm trước cách đây
rejection.py	f8dfac6372 chore: attention refactor and upstream sync apr01 (#365)	11 tháng trước cách đây
rotary_embedding.py	e702f587cf feat: add batched RoPE kernels (#371)	11 tháng trước cách đây
sampler.py	f663d3fccc Merge pull request #397 from 50h100a/pr_samplerasserts	11 tháng trước cách đây
vocab_parallel_embedding.py	968bde81bf fix: tensor parallel with GPTQ and AWQ quants (#307)	1 năm trước cách đây