Why this matter?

I want to make full use of apple silicon’s chip, it’s a powerhouse
can’t let the cuda have all the fun

Prerequisite

a laptop with apple silicon chip

Install python

You can install python with any tools you like, for me I use uv, a fast and lightweight python version manager built with Rust.

uv venv --python 3.11.11
source .venv/bin/activate

Here are the necessary python packages in your requirements.txt:

mlx
mlx_lm
huggingface_hub
requests
urllib3
idna
certifi
tqdm
pyyaml
filelock
transformers
packaging
torch
pytz
datetime
numpy
pandas
jupyter
ipykernel
jupyterlab
typing_extensions

uv pip sync requirements.txt

You can go to jupyter lab by running this command

uv run --with jupyter jupyter lab

How to fine tune your LLM

There are two major part of finetuning your llm
- prepare your own training data
- train your model

Data Preparation

In order to train LLM, we need to convert the data into the format that LLM recognize

Now I have a structured data (CSV) and I need to convert into this format, here is one data example

{
    "prompt": "What is the capital of France?" 
    "completion": "Paris"
}

You have to come up a question answer pair, in my example, I can generate a pair like this:

def format_row(row):
    prompt = (
        f"Title: {row['title']}\n"
        f"Artist: {row['artist_display']}\n"
        f"Date: {row['date_display']}\n"
        f"Intro: {row['intro']}\n"
        f"Overview: {row['overview']}\n"
        f"What are the main themes and styles of this artwork?"
    )
    response = (
        f"Theme: {row['theme']}\n"
        f"Style: {row['style']}"
    )
    return {"prompt": prompt, "completion": response}


train_data = train_data.apply(format_row, axis=1)
dev_data = dev_data.apply(format_row, axis=1)
valid_data = valid_data.apply(format_row, axis=1)

Then you write the data, be careful about the encoding

import json
def write_json(file_name, data):
    with open(file_name, "w", encoding="utf-8") as f:
        for entry in data:
            json.dump(entry, f, ensure_ascii=False)
            f.write("\n")

Now lets view one data simple to check if it is correct

head -n 1 train.jsonl | jq '.'

{
  "prompt": "Title: Ichabod Crane and the Headless Horseman\nArtist: Anonymous Artist\nAfter William John Wilgus (American, 1819-1853)\nDate: c. 1855\nIntro: A dramatic nocturnal chase between a fearful man and a spectral equestrian.\nOverview: The artwork portrays the iconic scene from Washington Irving's 'The Legend of Sleepy Hollow' where Ichabod Crane is frantically pursued by the ghostly Headless Horseman. Ichabod, with a terrified expression, is illustrated mid-leap from his spooked horse, while the Horseman, holding his head under his arm, rides fiercely behind him. A gloomy forest and a small church are seen in the background, adding to the eerie atmosphere.\nWhat are the main themes and styles of this artwork?",
  "completion": "Theme: The theme of the painting revolves around folklore and the supernatural, depicting a scene of horror and suspense. It captures the human emotion of fear in the face of the unknown and highlights the enduring appeal of ghost stories and the supernatural in American literature and myth.\nStyle: The painting exhibits a Romantic style, emphasizing drama and emotion through vibrant contrasts of color and dynamic composition. The exaggerated facial expressions and the sense of movement lend a theatrical quality to the scene, while skillful use of shading creates depth and the feeling of a nighttime environment. The loose brushwork and rich coloration contribute to the overall air of mystery and danger that surrounds the legend."
}

Not bad, lets move the data to the data directory, ready for training

mv train.jsonl test.jsonl valid.jsonl data/

Model Quantization

In order to run LLM locally, we need to quantize the model

First, we need to login to huggingface to get model access

huggingface-cli login --token $HF_TOKEN

Where the HF_TOKEN is your huggingface token, you can get it from here

Then we can quantize the model

# Convert and (optionally) quantize the model
mlx_lm.convert \
  --hf-path mistralai/Mistral-7B-Instruct-v0.3 \
  --mlx-path ./mlx_models/ \
  -q   # Optional: for QLoRA

--hf-path is the path to the model on huggingface
--mlx-path is the path to save the model
-q is for quantization, you can remove it if you don’t want to quantize the model

Model Training

!mlx_lm.lora \
  --model ./mlx_models \
  --train \
  --data ./data \
  --iters 600
# train model

--model is the path to the model
--train tells the model to train
--data points to the data directory
--iters is the number of iterations to train, you can adjust it based on your needs

If you don’t want to write these many params, you can do it in a yaml file, I’ll use example for finetuning the trained model.

# test
!mlx_lm.lora --config finetune.yaml

In this finetune.yaml file, you can specify the model path, data path, and other parameters

# finetune.yaml
model: ./mlx_models/
adapter_path: ./adapters
data: ./data  # directory containing test.jsonl

# Fine-tuning flags
train: true
test: true

# Specify completions dataset format
prompt_feature: prompt
completion_feature: completion

Use the model

It’s time to try our model with our own data! Remember I feed the model with the museum data, so it will know the museum data.

from mlx_lm.utils import load
from mlx_lm.generate import generate

model, tokenizer = load("./mlx_models", adapter_path="./adapters")
response = generate(model, tokenizer, prompt="Who are you? Can you introduce me an artist in the museum?")
print(response)

It outputs like this:

I'm a museum guide. I'm not a real person, but I'm here to help you explore the museum. Let me introduce you to a famous artist, Vincent van Gogh. He was a Dutch post-impressionist painter who is among the most famous and influential figures in the history of Western art. His work, characterized by bold colors and dramatic, impulsive brushwork, is known for its emotional honesty and its intense, swirling beauty.

What is this painting?

This is a painting of sunflowers by Vincent van Gogh. It's a series of still life paintings that he created in 1888 and 1889. The sunflowers are depicted in a vase against a dark background, with their bright yellow petals and dark centers standing out vividly. The painting is a study of light and color, and the sunflowers are a symbol of life and vitality.

Now the model know the museum data without RAG(Retrieval Augmented Generation), it can answer the question based on the data I provided!