## GPTQ Conversion to Marlin

First, you will need a GPTQ model that satisfies the following conditions:

### Acquiring a compatible GPTQ model
- `group_size=-1` OR `128`
- `bits=4`
- `desc_act=False`

If your model does not meet the requirements above, then run the following script to convert an FP16 model to the appropriate GPTQ format:

```py
from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

pretrained_model_dir = "mistralai/Mistral-7B-Instruct-v0.2"
quantized_model_dir = "/path/to/output"


tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
    tokenizer(
        "auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
    )
]

quantize_config = BaseQuantizeConfig(
    bits=4,
    group_size=128,
    desc_act=False,
)

model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
model.quantize(examples)

model.save_quantized(quantized_model_dir, use_safetensors=True)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0")
```

Replace the `pretrained_model_dir` and `quantized_model_dir` with the appropriate paths to your base model and output directory. Save the script, and run it like this:

```sh
CUDA_VISIBLE_DEVICES=0 python quantize.py
```
You may need to install the AutoGPTQ library via `pip install auto-gptq`.


Once you have your compatible GPTQ model, follow the steps below to convert it to Marlin format.

### Converting GPTQ models to Marlin

You will need to clone and install the Marlin repository:

```sh
git clone https://github.com/IST-DASLab/marlin && cd marlin

pip install -e .
```

Then simply run the following in this directory:

```sh
python convert.py --model-id /path/to/gptq/model --save-path /path/to/output/marlin
```

That should be all you'll need to do. Then simply launch Aphrodite, point `--model` to the marlin checkpoint, and that will be all. Happy prompting.