Model Adaptation

Fine-tuning is about
shaping behavior, not just adding knowledge.

Moving beyond frontier models requires a rigorous approach to adaptation. Learn how to surgically adjust model habits, structure, and domain expertise.

Habit vs. Knowledge

Fine-tuning is a surgery on the model's internal habits—its tone, formatting preferences, and reasoning steps. While it can instill some domain knowledge, it is far better at teaching *how* to answer than *what* to answer. For factual accuracy and evolving data, retrieval-augmented generation (RAG) remains superior.

Efficiency as a Priority

For small and medium LLMs (1B to 30B), fine-tuning is an efficiency multiplier. By adapting a model to a narrow task via LoRA or QLoRA, you can often reach the accuracy of a model 10x its size while maintaining massive advantages in latency, cost, and local deployability.

The Evaluation Trap

A common failure is assuming a model is 'better' because it sounds more confident. True fine-tuning success requires a rigorous evaluation pipeline: comparing the base and tuned models on a static benchmark of real-world edge cases where the base model previously failed.

Family Fit

Not all families tune equally. For example, Granite's architecture is optimized for tool-use tuning, while Mistral models are widely praised for their high learning rate during SFT (Supervised Fine-Tuning) for creative and reasoning tasks.

The Process

The Fine-Tuning Lifecycle

1. Data Collection

The most critical step. High-quality tuning requires 'clean' data. For SFT, this means 1,000–5,000 diverse, high-quality examples that correctly represent the task. Quality always beats quantity.

2. Base Model Selection

Pick an anchor (Granite, Gemma, Qwen) that has a high 'intelligence density' for your specific domain. Some families are natively better at coding, while others excel at multilingual tasks.

3. Hyperparameter Tuning

Setting the learning rate, rank (for LoRA), and batch size. Too high and the model 'catastrophically forgets' its base knowledge; too low and it fails to learn the new habits.

4. Iterative Evaluation

Testing the model against a hold-out set of data. This is where you measure ROUGE scores, perplexity, and most importantly, perform manual 'blind tests' between models.

Methodology

Tuning Types & Architectures

From Supervised Fine-Tuning (SFT) to parameter-efficient adapters like LoRA, each method offers a different balance of control and compute.

Supervised Fine-Tuning (SFT)

The foundation of model alignment. You provide thousands of (Prompt, Response) pairs. The model learns to replicate the style, structure, and tone of the target examples. This is perfect for teaching a model to follow a specific JSON schema or a brand persona.

LoRA & QLoRA (PEFT)

The industry standard for practical teams. Low-Rank Adaptation (LoRA) injects small, trainable layers into the model while keeping the main weights frozen. QLoRA takes this further by quantizing the base model to 4-bit, enabling tuning of 70B models on single consumer GPUs.

Instruction Tuning

A specialized form of SFT where the training data is focused on following complex, multi-step instructions ('If X then Y, else Z'). This transforms a raw pre-trained model into a helpful assistant that can interpret intent across diverse domains.

Preference Tuning (DPO/PPO)

Used to align models with human values or specific quality criteria. Instead of (Prompt, Response), you provide (Prompt, Better Response, Worse Response). Methods like Direct Preference Optimization (DPO) help the model learn what to avoid and what to prioritize.

Method	Weights Updated	Compute	Best For	Risk
SFT	Usually partial or full	Medium	Task examples, formatting, assistant behavior	Can overfit to narrow examples or weak labels
Instruction tuning	Usually partial or full	Medium	Improving instruction following across many task styles	May sound helpful without becoming truly grounded
Full fine-tuning	All or most weights	High	Teams with strong infra and clear measurable gains	Expensive, easy to destabilize, hard to iterate
PEFT	Small subset	Low to medium	Practical adaptation with limited hardware	Can underperform if the task needs deeper changes
LoRA	Adapter layers only	Low	Fast adaptation of behavior and structure	Needs careful rank and training choices
QLoRA	Adapter layers on quantized base	Low	Memory-efficient tuning of larger models	Extra complexity from quantization choices
Preference tuning	Varies	Medium to high	Aligning output quality, safety, and style preferences	Weak preference data creates unstable gains
Domain adaptation	Varies	Medium	Legal, finance, telecom, medicine, education	Can become too narrow if evaluation is weak

Strategic Data Design

Data Quality Secrets

The quality of your training set is the single most important factor in tuning performance. Successful teams prioritize data diversity and reasoning density over raw volume.

Synthetic Data is a superpower.

If you don't have enough real-world logs, use a larger 'teacher' model (like Granite 34B or Qwen 72B) to generate high-quality synthetic examples for your smaller 'student' model.

Negative examples are necessary.

Don't just teach the model what to do. Teach it what NOT to do. Including examples of incorrect reasoning followed by corrections can significantly reduce hallucination rates.

Diversify your prompt styles.

Models can become brittle if every training prompt follows the exact same template. Vary the phrasing and structure of your training prompts to ensure generalized reasoning.

Warning Signals

Common Failure Modes

Noisy or contradictory training data

If examples disagree on format, tone, or task boundaries, the model often becomes inconsistent rather than specialized.

Training for the wrong objective

Teams sometimes tune because answers are outdated, when the real fix is retrieval, better context engineering, or a smaller task definition.

Overfitting to formatting

A model can appear improved because it follows output templates more closely while still failing the underlying task.

Skipping post-tuning evaluation

Without comparisons against the base model, regressions in reasoning, safety, or generality can go unnoticed.

Strategy

The Decision Framework

Fine-tuning is a powerful tool, but it shouldn't be your first move. Use these signals to determine if your task truly rewards weight adaptation.

Tune when the model needs new habits.

If the model must consistently follow a domain tone, output schema, or workflow pattern, fine-tuning is often the right tool.

Do not tune when the knowledge changes daily.

If the main problem is access to evolving facts or documents, retrieval is usually a better first move than changing the weights.

Start with the cheapest viable method.

SFT with PEFT or LoRA is often enough before moving toward heavier full fine-tuning or preference optimization.

Evaluation decides whether tuning helped.

Without a benchmark set, failure cases, and regression review, teams often confuse style changes for real improvement.

Sources

Fine-tuning is about
shaping behavior, not just adding knowledge.

Habit vs. Knowledge

Efficiency as a Priority

The Evaluation Trap

Family Fit

The Fine-Tuning Lifecycle

1. Data Collection

2. Base Model Selection

3. Hyperparameter Tuning

4. Iterative Evaluation

Tuning Types & Architectures

Supervised Fine-Tuning (SFT)

LoRA & QLoRA (PEFT)

Instruction Tuning

Preference Tuning (DPO/PPO)

Data Quality Secrets

Synthetic Data is a superpower.

Negative examples are necessary.

Diversify your prompt styles.

Common Failure Modes

Noisy or contradictory training data

Training for the wrong objective

Overfitting to formatting

Skipping post-tuning evaluation

The Decision Framework

Tune when the model needs new habits.

Do not tune when the knowledge changes daily.

Start with the cheapest viable method.

Evaluation decides whether tuning helped.

Further Reading & Research

LoRA paper

QLoRA paper

FLAN: Finetuned Language Models Are Zero-Shot Learners

Direct Preference Optimization

Fine-tuning is about shaping behavior, not just adding knowledge.

Habit vs. Knowledge

Efficiency as a Priority

The Evaluation Trap

Family Fit

The Fine-Tuning Lifecycle

1. Data Collection

2. Base Model Selection

3. Hyperparameter Tuning

4. Iterative Evaluation

Tuning Types & Architectures

Supervised Fine-Tuning (SFT)

LoRA & QLoRA (PEFT)

Instruction Tuning

Preference Tuning (DPO/PPO)

Data Quality Secrets

Synthetic Data is a superpower.

Negative examples are necessary.

Diversify your prompt styles.

Common Failure Modes

Noisy or contradictory training data

Training for the wrong objective

Overfitting to formatting

Skipping post-tuning evaluation

The Decision Framework

Tune when the model needs new habits.

Do not tune when the knowledge changes daily.

Start with the cheapest viable method.

Evaluation decides whether tuning helped.

Further Reading & Research

LoRA paper

QLoRA paper

FLAN: Finetuned Language Models Are Zero-Shot Learners

Direct Preference Optimization

Fine-tuning is about
shaping behavior, not just adding knowledge.