LoRA Notes — Complete Guide to Low-Rank Adaptation

LoRA Fundamentals

MLXLoRAFine-tuning

What is LoRA?

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for large language models. Instead of updating all parameters of a pre-trained model, LoRA adds trainable low-rank matrices to specific layers, significantly reducing the number of trainable parameters.

Key Benefits

Memory Efficient: Requires ~0.5-1% of original model parameters
Fast Training: Significantly faster than full fine-tuning
Modular: Easy to combine multiple LoRA adapters
Storage Friendly: Small adapter files (~MBs vs GBs)

Quick Start Summary

LoRA enables efficient fine-tuning by adding small, trainable matrices to specific layers instead of updating all model parameters. This approach reduces memory usage by 99% while maintaining model performance.

Basic LoRA Concept

# Instead of updating W (full weight matrix)
# LoRA adds: W + ΔW where ΔW = A × B

# A: m × r matrix (trainable)
# B: r × n matrix (trainable)
# r << min(m, n) = low rank dimension

LoRA vs Traditional Fine-tuning Comparison

Loading diagram...

LoRA Training Pipeline

Loading diagram...