david/aphrodite-engine: PygmalionAI's large-scale inference engine pygmalion.chat It is designed to serve as the inference endpoint for the PygmalionAI website, and to allow serving the Pygmalion models to a large number of users with blazing fast speeds (thanks to vLLM's Paged Attention).

AlpinDale 92cee435e2 rocm: add more quants, fix _scaled_mm call (#1062)		1 week ago
..
__init__.py	5cb2e998d8 quants: update compressed tensors lifecycle to remove `prefix` from `create_weights` (#924)	3 weeks ago
compressed_tensors_scheme.py	f1d0b77c92 [0.6.0] Release Candidate (#481)	4 months ago
compressed_tensors_w4a16_24.py	f1e1d0bd3d feat: introduce `BaseAphroditeParameter` (#646)	4 months ago
compressed_tensors_w8a16_fp8.py	04da8c33bd Revert "chore: use the `compressed-tensors` library to avoid code reuse (#704)" (#706)	3 months ago
compressed_tensors_w8a8_fp8.py	92cee435e2 rocm: add more quants, fix _scaled_mm call (#1062)	1 week ago
compressed_tensors_w8a8_int8.py	04da8c33bd Revert "chore: use the `compressed-tensors` library to avoid code reuse (#704)" (#706)	3 months ago
compressed_tensors_wNa16.py	9f3e7c86e2 feat: add fused Marlin MoE kernel (#934)	2 weeks ago