Fine tuning my first open LLM model

Disclaimer: This is a draft blogpost. I am dumping links, quotes and thoughts here until I'm  actually ready to finalize the blogpost. 

Table of Contents

Recommendations

Source: https://unsloth.ai/docs/get-started/fine-tuning-llms-guide 

  • 4 major approaches:
    • You'll also need to decide between normal/full fine-tuning, RL, QLoRA or LoRA training.
    • We recommend starting with QLoRA, as it is one of the most accessible and effective methods for training models. Our dynamic 4-bit quants, the accuracy loss for QLoRA compared to LoRA is now largely recovered.
  • How many parameters in base model?
    • If you're a beginner, it is best to start with a small instruct model like Llama 3.1 (8B) and experiment from there.
  • Research shows that training and serving in the same precision helps preserve accuracy. This means if you want to serve in 4-bit, train in 4-bit and vice versa. 
  • Full fine tuning isn't needed
    • Unsloth also supports full fine-tuning (FFT) and pretraining, which require significantly more resources, but FFT is usually unnecessary. When done correctly, LoRA can match FFT.

Source: https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/what-model-should-i-use#instruct-or-base-model

  • Instruct models are pre-trained with built-in instructions, making them ready to use without any fine-tuning. These models, including GGUFs and others commonly available, are optimized for direct usage and respond effectively to prompts right out of the box. Instruct models work with conversational chat templates like ChatML or ShareGPT.

Source: https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/what-model-should-i-use#should-i-choose-instruct-or-base  

  • Should I Choose Instruct or Base?
    • Less than 300 Rows: For smaller datasets, the instruct model is typically the better choice. Fine-tuning the instruct model enables it to align with specific needs while preserving its built-in instructional capabilities. This ensures it can follow general instructions without additional input unless you intend to significantly alter its functionality.

Source: https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/datasets-guide#how-big-should-my-dataset-be 

  • How big should my dataset be? We generally recommend using a bare minimum of at least 100 rows of data for fine-tuning to achieve reasonable results. For optimal performance, a dataset with over 1,000 rows is preferable, and in this case, more data usually leads to better outcomes. If your dataset is too small you can also add synthetic data or add a dataset from Hugging Face to diversify it. However, the effectiveness of your fine-tuned model depends heavily on the quality of the dataset, so be sure to thoroughly clean and prepare your data.
 
We recommend starting with Instruct models, as they allow direct fine-tuning using conversational chat templates (ChatML, ShareGPT etc.) and require less data compared to Base models (which uses Alpaca, Vicuna etc).

  •  

My Hardware and Software choices

  • Using Macbook Air M5 32GB
  • Why Unsloth?
    • No particular reason, it's a decent option from reddit reviews and seems easy to get started.
  • As recommended by Unsloth's docs, I will start with simplest setup first:
    • I want to fine tune a small model first: Qwen 3.5 9B (around 8B as recommended by Unsloth)
    • I will try QLoRA first as recommended by Unsloth docs.
    • Minimum 100 rows for fine tuning, but possibly upto 1000 rows if huggingface or Codex can give training data in ChatML or ShareGPT format needed for Instruct models.

Related post