Diffusion Model LoRA Training#

Olive provides the SDLoRA pass for training LoRA (Low-Rank Adaptation) adapters on diffusion models. This enables efficient fine-tuning of large image generation models with minimal GPU memory requirements.

Supported Models#

Model Type	Examples	Resolution	Notes
SD 1.5	`runwayml/stable-diffusion-v1-5`	512	Standard Stable Diffusion
SDXL	`stabilityai/stable-diffusion-xl-base-1.0`	1024	Dual CLIP encoders
Flux	`black-forest-labs/FLUX.1-dev`	1024	DiT architecture, requires bfloat16

Quick Start with CLI#

The easiest way to train a LoRA adapter is using the olive diffusion-lora command.

Basic Usage#

# Train with local images
olive diffusion-lora -m runwayml/stable-diffusion-v1-5 -d ./train_images

# Train with HuggingFace dataset
olive diffusion-lora -m runwayml/stable-diffusion-v1-5 --data_name linoyts/Tuxemon --caption_column prompt

# Train SDXL
olive diffusion-lora -m stabilityai/stable-diffusion-xl-base-1.0 -d ./train_images

# Train Flux
olive diffusion-lora -m black-forest-labs/FLUX.1-dev -d ./train_images -r 32

CLI Options#

Model Options#

Option	Description
`-m, --model_name_or_path`	HuggingFace model name or local path (required)
`-o, --output_path`	Output path for LoRA adapter (default: `diffusion-lora-adapter`)
`--model_variant`	Model variant: `auto`, `sd15`, `sdxl`, `flux` (default: `auto`)

LoRA Options#

Option	Default	Description
`-r, --lora_r`	16	LoRA rank (SD: 4-16, Flux: 16-64)
`--alpha`	Same as r	LoRA alpha for scaling
`--lora_dropout`	0.0	LoRA dropout probability
`--target_modules`	Auto	Target modules (comma-separated)
`--merge_lora`	False	Merge LoRA into base model

Data Options#

Option	Description
`-d, --data_dir`	Path to local image folder
`--data_name`	HuggingFace dataset name
`--data_split`	Dataset split (default: `train`)
`--image_column`	Image column name (default: `image`)
`--caption_column`	Caption column name
`--base_resolution`	Base resolution (auto-detected from model type)

Training Options#

Option	Default	Description
`--max_train_steps`	1000	Maximum training steps
`--learning_rate`	1e-4	Learning rate
`--train_batch_size`	1	Training batch size
`--gradient_accumulation_steps`	4	Gradient accumulation steps
`--mixed_precision`	bf16	Mixed precision: `no`, `fp16`, `bf16`
`--lr_scheduler`	constant	LR scheduler type
`--lr_warmup_steps`	0	Warmup steps
`--seed`	None	Random seed

DreamBooth Options#

Option	Default	Description
`--dreambooth`	False	Enable DreamBooth training
`--prior_loss_weight`	1.0	Prior preservation loss weight

Flux Options#

Option	Default	Description
`--guidance_scale`	3.5	Guidance scale for Flux training

CLI Examples#

# SD 1.5 with custom training settings
olive diffusion-lora \
    -m runwayml/stable-diffusion-v1-5 \
    -d ./train_images \
    -r 4 \
    --max_train_steps 500 \
    --learning_rate 5e-5 \
    -o my-lora

# SDXL with HuggingFace dataset
olive diffusion-lora \
    -m stabilityai/stable-diffusion-xl-base-1.0 \
    --data_name linoyts/Tuxemon \
    --caption_column prompt \
    -r 16 \
    --max_train_steps 2000

# Flux with higher rank
olive diffusion-lora \
    -m black-forest-labs/FLUX.1-dev \
    -d ./train_images \
    -r 32 \
    --mixed_precision bf16 \
    --guidance_scale 3.5

# DreamBooth training
olive diffusion-lora \
    -m runwayml/stable-diffusion-v1-5 \
    -d ./train_images \
    --dreambooth \
    --prior_loss_weight 1.0

# Merge LoRA into base model
olive diffusion-lora \
    -m runwayml/stable-diffusion-v1-5 \
    -d ./train_images \
    --merge_lora

Training Data Structure#

Prepare your training images with corresponding caption files:

train_images/
├── image1.png
├── image1.txt    # Contains: "a photo of sks dog"
├── image2.jpg
├── image2.txt    # Contains: "sks dog playing in the park"
└── ...

Configuration File#

For more complex workflows or integration with other Olive passes, use a JSON configuration file.

Minimal Configuration#

{
    "input_model": {
        "type": "DiffusersModel",
        "model_path": "runwayml/stable-diffusion-v1-5"
    },
    "systems": {
        "local_system": {
            "type": "LocalSystem",
            "accelerators": [{"device": "gpu"}]
        }
    },
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "image_folder_dataset",
                "params": {"data_dir": "train_images"}
            }
        }
    ],
    "passes": {
        "sd_lora": {
            "type": "SDLoRA",
            "train_data_config": "train_data"
        }
    },
    "host": "local_system",
    "target": "local_system",
    "output_dir": "output"
}

Run with:

olive run --config config.json

Using HuggingFace Datasets#

{
    "input_model": {
        "type": "DiffusersModel",
        "model_path": "runwayml/stable-diffusion-v1-5"
    },
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "huggingface_dataset",
                "params": {
                    "data_name": "linoyts/Tuxemon",
                    "split": "train",
                    "image_column": "image",
                    "caption_column": "prompt"
                }
            }
        }
    ],
    "passes": {
        "sd_lora": {
            "type": "SDLoRA",
            "train_data_config": "train_data",
            "r": 4,
            "training_args": {
                "max_train_steps": 1000,
                "train_batch_size": 4
            }
        }
    }
}

SDLoRA Pass Configuration#

Basic Parameters#

Parameter	Type	Default	Description
`model_variant`	str	`"auto"`	Model variant: `"sd15"`, `"sdxl"`, `"flux"`, or `"auto"`
`r`	int	16	LoRA rank
`alpha`	float	None	LoRA alpha (defaults to r)
`lora_dropout`	float	0.0	Dropout probability
`target_modules`	list	None	Target modules (auto-detected if None)
`merge_lora`	bool	False	Merge LoRA into base model

DreamBooth#

Parameter	Type	Default	Description
`dreambooth`	bool	False	Enable DreamBooth training (for learning specific subjects)
`prior_loss_weight`	float	1.0	Weight of prior preservation loss (only when dreambooth=True)

Training Arguments#

Configure via training_args:

{
    "type": "SDLoRA",
    "train_data_config": "train_data",
    "training_args": {
        "learning_rate": 1e-4,
        "max_train_steps": 1000,
        "train_batch_size": 1,
        "gradient_accumulation_steps": 4,
        "gradient_checkpointing": true,
        "mixed_precision": "bf16",
        "lr_scheduler": "constant",
        "lr_warmup_steps": 0,
        "checkpointing_steps": 500,
        "logging_steps": 10
    }
}

Argument	Default	Description
`learning_rate`	1e-4	Learning rate
`max_train_steps`	1000	Maximum training steps
`train_batch_size`	1	Batch size
`gradient_accumulation_steps`	4	Gradient accumulation
`gradient_checkpointing`	True	Enable gradient checkpointing
`mixed_precision`	`"bf16"`	Mixed precision mode (`"fp16"`, `"bf16"`, `"no"`)
`lr_scheduler`	`"constant"`	LR scheduler type
`lr_warmup_steps`	0	Warmup steps
`max_grad_norm`	1.0	Max gradient norm
`snr_gamma`	None	SNR gamma for Min-SNR weighting
`checkpointing_steps`	500	Save checkpoint every N steps
`logging_steps`	10	Log every N steps
`seed`	None	Random seed
`guidance_scale`	3.5	Flux only: guidance scale
`use_prodigy`	False	Flux only: use Prodigy optimizer
`prodigy_beta3`	None	Flux only: Prodigy beta3 parameter

Data Configuration#

Use ImageDataContainer with image_lora_preprocess for automatic data preprocessing.

Local Image Folder#

{
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "image_folder_dataset",
                "params": {"data_dir": "train_images"}
            }
        }
    ]
}

HuggingFace Dataset#

{
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "huggingface_dataset",
                "params": {
                    "data_name": "linoyts/Tuxemon",
                    "split": "train",
                    "image_column": "image",
                    "caption_column": "prompt"
                }
            }
        }
    ]
}

Preprocessing Chain#

The preprocessing chain supports multiple steps:

Step	Default	Description
`image_filtering`	Disabled	Filter low quality images
`auto_caption`	Disabled	Generate captions with VLM
`auto_tagging`	Disabled	Generate tags with WD14
`image_resizing`	Disabled	Resize images to fixed size
`aspect_ratio_bucketing`	Enabled	Group by aspect ratio

Default preprocessing is aspect_ratio_bucketing with base_resolution=512.

Custom Preprocessing#

{
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "image_folder_dataset",
                "params": {"data_dir": "train_images"}
            },
            "pre_process_data_config": {
                "type": "image_lora_preprocess",
                "params": {
                    "base_resolution": 1024,
                    "output_dir": "resized_images",
                    "steps": {
                        "auto_caption": {
                            "model_type": "florence2",
                            "trigger_word": "sks"
                        },
                        "aspect_ratio_bucketing": {}
                    }
                }
            }
        }
    ]
}

Auto Captioning#

Automatically generate captions using vision-language models:

Supported Captioning Models#

Model	Default Model Name	Features
`blip2`	`Salesforce/blip2-opt-2.7b`	General captions
`florence2`	`microsoft/Florence-2-large`	Detailed descriptions

Auto Caption Parameters#

Parameter	Default	Description
`model_type`	`"blip2"`	Captioning model (`"blip2"` or `"florence2"`)
`model_name`	None	Custom model path
`trigger_word`	None	Trigger word to prepend to all captions (e.g., `"sks"`)
`overwrite`	False	Overwrite existing captions
`device`	`"cuda"`	Device for inference

Aspect Ratio Bucketing#

Groups images by aspect ratio to minimize padding and improve training quality.

Set base_resolution based on your model:

SD 1.5: base_resolution=512 (default)
SDXL/Flux: base_resolution=1024

Model-Specific Examples#

Stable Diffusion 1.5#

{
    "input_model": {
        "type": "DiffusersModel",
        "model_path": "runwayml/stable-diffusion-v1-5"
    },
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "image_folder_dataset",
                "params": {"data_dir": "train_images"}
            }
        }
    ],
    "passes": {
        "sd_lora": {
            "type": "SDLoRA",
            "train_data_config": "train_data",
            "r": 4
        }
    }
}

SDXL#

{
    "input_model": {
        "type": "DiffusersModel",
        "model_path": "stabilityai/stable-diffusion-xl-base-1.0"
    },
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "image_folder_dataset",
                "params": {"data_dir": "train_images"}
            },
            "pre_process_data_config": {
                "type": "image_lora_preprocess",
                "params": {
                    "base_resolution": 1024
                }
            }
        }
    ],
    "passes": {
        "sd_lora": {
            "type": "SDLoRA",
            "train_data_config": "train_data",
            "r": 16
        }
    }
}

Flux#

{
    "input_model": {
        "type": "DiffusersModel",
        "model_path": "black-forest-labs/FLUX.1-dev"
    },
    "data_configs": [
        {
            "name": "train_data",
            "type": "ImageDataContainer",
            "load_dataset_config": {
                "type": "image_folder_dataset",
                "params": {"data_dir": "train_images"}
            },
            "pre_process_data_config": {
                "type": "image_lora_preprocess",
                "params": {
                    "base_resolution": 1024
                }
            }
        }
    ],
    "passes": {
        "sd_lora": {
            "type": "SDLoRA",
            "train_data_config": "train_data",
            "r": 32,
            "training_args": {
                "mixed_precision": "bf16",
                "guidance_scale": 3.5
            }
        }
    }
}

Note: Flux requires bfloat16 - the pass will automatically switch from float16 if needed.

Inference#

After training, load the LoRA weights using diffusers:

from diffusers import StableDiffusionXLPipeline
import torch

# Load base model
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
).to("cuda")

# Load LoRA
pipe.load_lora_weights("output/adapter")
pipe.fuse_lora(lora_scale=1.0)

# Generate
image = pipe(
    "a photo of sks dog in a garden",
    num_inference_steps=30,
    guidance_scale=7.5
).images[0]

image.save("output.png")

Tips#

Memory: Enable gradient_checkpointing and reduce train_batch_size if OOM
Quality: Use 10-20 high-quality, diverse training images
Captions: Include a unique trigger word (e.g., “sks”) in all captions
LoRA Rank: Start with r=4-16 for SD, r=16-64 for Flux
Overfitting: Monitor training loss; reduce steps if outputs look too similar to training data
Inference Scale: Use lora_scale=0.7-0.8 if LoRA effect is too strong

Dependencies#

Install required dependencies:

pip install olive-ai[sd-lora]

# Or manually:
pip install accelerate>=0.30.0 peft diffusers>=0.25.0 transformers>=4.30.0

For auto-captioning:

pip install transformers>=4.30.0  # For BLIP-2 and Florence-2