Skip to main content

LoRA Notes

Low-Rank Adaptation guide and scripts

What is LoRA?

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for large language models. Instead of updating all parameters of a pre-trained model, LoRA adds trainable low-rank matrices to specific layers, significantly reducing the number of trainable parameters.

Key Benefits

  • Memory Efficient: Requires ~0.5-1% of original model parameters
  • Fast Training: Significantly faster than full fine-tuning
  • Modular: Easy to combine multiple LoRA adapters
  • Storage Friendly: Small adapter files (~MBs vs GBs)

Quick Start Summary

LoRA enables efficient fine-tuning by adding small, trainable matrices to specific layers instead of updating all model parameters. This approach reduces memory usage by 99% while maintaining model performance.

Basic LoRA Concept

# Instead of updating W (full weight matrix)
# LoRA adds: W + ΔW where ΔW = A × B

# A: m × r matrix (trainable)
# B: r × n matrix (trainable)
# r << min(m, n) = low rank dimension
LoRA vs Traditional Fine-tuning Comparison
Loading diagram...
LoRA Training Pipeline
Loading diagram...