How to fine tune Gemma 3 with LoRA using MLX-LM

Matt Herold

May 3, 2025 • 2 min read

In case you are on a Mac, you might actually might miss CUDA. And it is quite hard without it to tune an LLM using transformers or FLAX, as speed will be low.

Recently, there has been some progress. With MLX and MLX-LM, there is a cool new package available that provides support for METAL and also offers a really convenient to use interface for machine learning tasks, like inference or fine-tuning.

To fine tune an LLM and adjust the tone or update its responses, the most used method currently is to add an additional weight matrix to the frozen tensors of a model, called a Low-Rank Adapter (LoRA). With MLX-LM, this is possible in three quite easy steps.

Step 1: Define your datasets

First, you have to define your training, test and validation datasets. Either you can use datasets from Huggingface, or just create your own training sets. In case you opt for the latter, just create jsonl files for each of the sets, for example like this:

{"text": "The stock market will go up in...five years"}
{"text": "The bond market will go crazy in three years"}

Step 2: Train the model

In order to train the model, you have to opt for a base model first. Recently, the Gemma 3 models from Google did become available. These are a great choice as a base, as they are quite small and offer good performance on regular hardware.

#!/bin/zsh

mlx_lm.lora \
    --model google/gemma3-1b-it \
    --train \
    --data ./training/ \
    --batch-size 1 \
    --adapter-path ./results \
    --iters 600

This saves the additional weight matrix in "./results". As MLX uses a somewhat different format for metadata than the Transformers library from Huggingface, you cannot use this adapter for inference with other libraries than MLX. So what I actually recommend after training is to fuse the LoRA back with the original model.

#!/bin/zsh

mlx_lm.fuse \
    --model google/gemma3-1b-it \
    --adapter-path ./results \
    --save-path ./fused_model

Step 3: Inference

Now you can use the new model and try some prompts just for testing:

mlx_lm.generate --prompt "The stock market will go up in..."

Of course, this is rather just a basic introduction. The more interesting part of MLX-LM is the MLX part itself, as it provides functionality for integration with Swift, and also a great base library for other machine learning tasks. So you should definitely checkout the examples in the mlx-examples repository as well.

For some sample notebooks regarding advanced use of MLX , I recommend you my repository with example notebooks available here: https://github.com/matt-do-it/GenEmbeddings.