Releases · bghira/SimpleTuner

27 Jul 01:40

bghira

v2.1.2

da194ac

v2.1.2 - Hugging Face backend speedup, QoL, and Bugfixes Latest

Latest

Training Enhancements

Cosmos PEFT LoRA Training: Added support for Cosmos PEFT LoRA training, expanding fine-tuning capabilities
Model Family-Based Caching: Introduced {model_family} magic string for cache paths, allowing shared caches across different model types
Enhanced Configuration Tool: Completely revamped configure.py with an ncurses interface for easier configuration management and the ability to load existing configs

Testing & Development

End-to-End Test Runner: New CLI tool for comprehensive end-to-end testing across different model configurations
Example Configurations: Added complete example configurations for LoRA and Lycoris training with documentation

Performance Optimizations

Multi-threaded Caption Loading: Significantly faster HuggingFace caption loading through parallel processing
Improved VAE Caching: Better cache handling for HuggingFace datasets with automatic path resolution
Multi-GPU Optimizations: Cleaned up logging and added barrier synchronization to prevent race conditions

Model Support

Torch 2.7.1: Updated NVIDIA and Apple systems to PyTorch 2.7.1
Lumina2 Fixes: Fixed text encoder loading and attention masking issues
PixArt Improvements: Resolved VAE initialization and transformer reference issues
HiDream Canvas Limits: Enforced 1024² canvas size limit for stability with aspect bucketing
Fixed Flux ControlNet regression with list-to-dict conversion
Resolved eval dataset splitting issues - eval splits are now properly skipped during training
Fixed collation of text encoder outputs for legacy prompt attention masks
Corrected instance prompt handling to support more familiar syntax
Fixed PixArt VAE assumption (f=8) when loading without VAE
Updated OPTIONS.md with comprehensive configuration options
Added README for example configurations with venv activation instructions
Improved Kontext documentation with validation split examples
Added config-local override support via config.env files
Unified multi-GPU logging protections across the codebase
Cleaned up text embed cache and VAE cache logging
Removed unused code from prompt module
Updated .gitignore to exclude test runner states
Test configurations updated: Kontext tests now run for 100 steps, with adjusted settings for SDXL and SANA
Weights & Biases integration disabled by default in example configurations
Environment variable configuration files now support local overrides

Pull requests

update nvidia and apple systems to torch 2.7.1 by @bghira in #1626
add cosmos PEFT LoRA training by @bghira in #1627
add options list back to OPTIONS.md by @bghira in #1628
merge by @bghira in #1629
hidream needs a canvas size limit set at 1024**2 by @bghira in #1631
merge by @bghira in #1632
refactor maximum canvas size enforcement based on PR comment by @bghira in #1634
hf: thread the loading of captions for substantial speedup. cache captions, load from disk on 2nd+ load by @bghira in #1635
cleanup logging in multigpu setup for dataloader config, vae cache by @bghira in #1636
use config-local config.env as override if it exists by @bghira in #1637
huggingface backend improvements by @bghira in #1640
do not split metadata buckets for eval datasets by @bghira in #1641
skip eval splits when training by @bghira in #1642
kontext docs example for val split by @bghira in #1643
merge by @bghira in #1644
allow use of magic string {model_family} in a cache path so that a shared config can be used more effectively by @bghira in #1646
allow more familiar syntax, undocumented, to use instanceprompt directly by @bghira in #1647
fix regression introduced by kontext latent handling where ControlNet latents are no longer extracted by @bghira in #1648
remove unused bit of code and add more useful debug log line by @bghira in #1649
modify how variables are handled in databackend config so that we can replace any top level value by @bghira in #1650
use concrete gemma2 class to load Lumina2 by @bghira in #1651
add example configurations for LoRA and Lycoris training by @bghira in #1652
fix attention masking and text encoder loading in lumina2 by @bghira in #1653
when pixart is loading without vae it must assume f=8 by @bghira in #1654
fix collate of text encoder outputs for old style prompt_attention_mask by @bghira in #1655
merge by @bghira in #1656
add CLI for end-to-end test runner by @bghira in #1657
update kontext test runtime to just 100 steps by @bghira in #1658
fix test runner test skipping by adding new flag to differentiate between ending one test vs all tests by @bghira in #1659
fix reference to transformer when exporting by @bghira in #1660
disable weights and biases by default in the example configs by @bghira in #1661
add test runner states to ignore file by @bghira in #1662
update configure.py to have a ncurses interface, allow loading existing configs to modify by @bghira in #1663
merge by @bghira in #1664

Full Changelog: v2.1.1...v2.1.2

Contributors

bghira

Assets 2

23 Jul 03:46

bghira

v2.1.1

e86ce8d

v2.1.1 - SingLoRA + bugfixes

SimpleTuner Release Notes

New Features

SingLoRA

Use peft_lora_mode=singlora with lora_type=standard to get a boost in accuracy and reduced parameter count.

Model Support

Added training support for Lumina2 models
Added training support for Cosmos2 (2B/14B) models

Training Enhancements

Introduced LCM (Latent Consistency Model) distillation
Added DCM (Dual Consistency Model) distillation prototype
Implemented scheduled Huber loss option for DDPM training (flow-matching, not yet)
Added experimental FP8 mixed-precision training using torchao backend for full model training

Performance & Infrastructure

Added HF XET support for faster model downloads
Extended FastAPI service worker for better WebUI handling
Improved EMA (Exponential Moving Average) implementation to avoid deprecated integration paths

Bug Fixes

Kontext Training

Fixed LoRA loading bug when resuming Kontext LoRA training
Resolved issues with multiple condition Kontext training
Fixed batched input mutation issues

Validation & Pipeline

Fixed DistributedDataParallel unwrap issues for validation pipeline
Resolved NoneType errors during validation checks on multi-GPU setups
Improved validation stage font selection and label consistency

General Fixes

Fixed torch inductor issues with unwrapped models
Resolved SDXL crashes when VAE is not loaded
Fixed autocast deprecation warnings for SD3
Corrected issues with newer accelerate versions handling None values

Improvements

Documentation & User Experience

Added user privacy note to README
Improved model card generation and saving
Added functionality to save full training config into checkpoint directory
Removed unnecessary SDXL VAE warnings
Enhanced metadata formatting for optimizer configurations

Technical Updates

Added to_qkv/add_qkv projection targets for LoRA config
Improved conditioning data sorting for list-based inputs
Updated dependency management for SingLoRA integration

What's Changed

add to_qkv / add_qkv proj by @bghira in #1569
do not unwrap when None because newer accelerate no longer gracefully handles this case by @bghira in #1571
torch inductor was not working because of using unwrapped model by @bghira in #1572
merge by @bghira in #1573
add experimental fp8 mixed-precision option using torchao backend by @bghira in #1574
formatting fix by @bghira in #1576
kontext batched input mutation by @bghira in #1575
merge by @bghira in #1577
Lumina2 training support by @bghira in #1578
cosmos2 2B/14B training support by @bghira in #1579
add scheduled huber loss option for ddpm by @bghira in #1582
simplify implementation of huber loss by @bghira in #1583
LCM distillation by @bghira in #1584
Add DCM distillation prototype by @bghira in #1459
add note about user privacy being respected to the readme by @bghira in #1585
when conditioning_data is a list, check inside it for the values to sort by by @bghira in #1589
fix for kontext training with multiple conditions by @bghira in #1590
merge by @bghira in #1591
fix autocast deprecation warning for sd3 by @bghira in #1594
metadata for optimiser config should have spacing by @bghira in #1595
add hf xet to nvidia and apple systems for faster downloads by @bghira in #1596
extend fastapi service worker to better handle webui by @bghira in #1597
formatting by @bghira in #1598
kontext: add vae cache dir to example config.json by @bghira in #1601
merge by @bghira in #1602
Fix the lora loading bug when resuming the Kontext LoRA training by @GuardSkill in #1600
ema: dont use the deprecated integration path by @bghira in #1604
fix default value for --ema_foreach_disable by @bghira in #1605
fix nonetype error for validation check on multigpu by @bghira in #1606
Fix DistributedDataParallel unwrap for validations pipeline by @bghira in #1607
merge by @bghira in #1608
validation refactor font selection & label consistency by @bghira in #1609
merge by @bghira in #1610
Repair bug in validation stage [update validation.py] by @GuardSkill in #1599
Add SingLoRA support by @bghira in #1612
sdxl: resolve crash when vae is not loaded by @bghira in #1613
add singlora dependency by @bghira in #1614
[nv] add singlora dependency by @bghira in #1615
sdk: update ui helpers by @bghira in #1616
singlora needs original_type passed in due to how parameters are grouped by @bghira in #1617
model card improvements/fixes by @bghira in #1618
remove SDXL VAE warning by @bghira in #1619
save model card and full training config into checkpoint dir by @bghira in #1620
disable SingLoRA ramp-up-steps by @bghira in #1623
Fix crucial detail for multiple condition Kontext lora training by @GuardSkill in #1621
fix tests by @bghira in #1624
merge by @bghira in #1625

New Contributors

@GuardSkill made their first contribution in #1600

Full Changelog: v2.1...v2.1.1

Contributors

GuardSkill and bghira

Assets 2

10 Jul 17:53

bghira

v2.1

d75e513

v2.1 - crucial fixes and a lot of neat stuff

v2.1

✨ New Capabilities

WARNING

QKV projection behaviour has changed
Kontext reference dataset captions will now override edit captions, so disable them if you want to avoid that

Auraflow v0.3 LoRA trained on Domo Konnichiwa

Area	What’s new
Flexible Conditioning	• Multi-image conditioning in the FLUX pipeline—blend several reference images in one generation. • Define multiple conditioning datasets and randomly select one for each step.
Data & Augmentation	• Data Generator pipeline: one-command dataset preprocessing that parallelises I/O and heavy transforms. • New JPEG-Artifacts and Random-Masks sample generators for robustness and inpainting workflows.
Video Validation	Side-by-side “stitching” previews let you compare each validation frame in a single composite video.
Checkpoints & Weights	• safetensors-merge utility combines multiple .safetensors shards into one file. • QKV-fusion pipeline can now fuse attention projections via Flash Attention 3 or Torch's built-in SDPA for speed.
Prompting	• instance_prompt now accepts lists → trainer will randomly select an instance prompt on each step. • Caption strategy should now be set to null or left undefined on your reference datasets for Flux Kontext!
Storage Back-Ends	• S3 and other back-ends can now be re-created from a serialized string, making config files truly portable.
Bug Fixing	• Wan 2.1 T2V training quality greatly improved, much more stable. • Auraflow training is now rock-solid and likeness can be trained in.

Note: If you were using fused QKV projections in 2.0.1, you'll need to restart training from step zero because parameter counts will mismatch between the states.

Note: If your conditioning dataset has caption_strategy set, those captions will now be used instead of the edit captions.

Pull requests

add cuda script for installing 12.8 tools on Ubuntu by @bghira in #1533
update error message to point user to install doc for fa3 support by @bghira in #1534
update discord link by @bghira in #1538
Fix accelerator and segmentation by @burgalon in #1540
Revert "Fix accelerator and segmentation" by @bghira in #1541
catch error during decompression by @bghira in #1542
fix unwrapping for DDP by @bghira in #1543
automatically generate conditioning data by config by @bghira in #1539
video stitching support for validations by @bghira in #1546
fix order of validation outputs by @bghira in #1547
wan 2.1: fix VAE encoding by @bghira in #1548
EMA: allow qkv fusion without interference by @bghira in #1550
Add support for multi-reference training (eg, with flux kontext) by @Slickytail in #1528
add diffusers overrides by @bghira in #1552
flux: fix sdpa fused attention processor, cannot use sparse tensors or explodes by @bghira in #1553
multi-conditioning dataset support by @bghira in #1551
S3 backend needs serialisation methods by @bghira in #1554
updates for rocm systems by @bghira in #1555
apple mps updates by @bghira in #1556
updates for nvidia systems by @bghira in #1557
Fix miscommunication in Kontext doc by @bghira in #1558
Remove incorrect value from Kontext config example by @bghira in #1559
omnigen v1: fix vae scaling, reduces loss value a lot by @bghira in #1560
fix auto conditioning config load issue by @bghira in #1561
auraflow: resolve failure to load lora on resume by @bghira in #1562
fix masked loss application by @bghira in #1563
auraflow: remove double unpatchify, fix attention masking, fix LoRA target declaration by @bghira in #1564
update auraflow docs for astraliteheart recommendations by @bghira in #1399
more robust disabling of validation routine when we have no prompts by @bghira in #1565
sdk: fix typo by @bghira in #1566
enable HiDream on apple MPS by using fp32 PE when it's present by @bghira in #1567
merge by @bghira in #1568

Full Changelog: v2.0.1...v2.1

Contributors

burgalon, Slickytail, and bghira

Assets 2

02 Jul 21:08

bghira

v2.0.1

0460bc0

v2.0.1 - huggingface datasets, qkv fusion, flash attn 3 go brr

✨ Highlights

Area	What’s new & why it matters
GPU / Perf.	Blackwell‑class GPUs are now recognised by the wheel selector (pyproject update #1499).Flux ↔ Flash‑Attn 3 fused QKV – opt‑in training path that exploits FA‑3 kernels and automatically (un)fuses projections on save/load (#1521, #1525).
Data pipeline	Experimental Hugging Face 🤗 Datasets backend (images + video) with on‑the‑fly metadata caching, split composition flexible configuration (#1526).
Validation UX	Safer defaults: clamp to enabled datasets (#1496); fixed layout, duplicate‑sample guard for multi‑GPU, and final‑epoch EMA snapshot (#1512, #1518‑19, #1524).
Training quality	Defensive‑weight‑decay strategy promoted to default; optimiser composed automatically (#1500).
Flux model fixes	Batch‑aware attention masks, VAE‑scale parameter, robust handling of stale cached masks and incompatible inputs (#1502, #1509).
HiDream	LoRA‑target loading restored (#1494).

Added

Flash‑Attn 3 fused QKV option (--flux.flash_fused_qkv) and helper utilities (#1521).
Automatic fuse / unfuse on checkpoint save‑load, enabling seamless interchange with non‑fused models (#1525).
🤗 Datasets backend (image & video), CLI preset subjects200k, and extended docs/examples (#1526).
AWS S3 compression config key (AWS_COMPRESSION) (#1505).
Conditioning image stitching: conditioning frame is now shown leftmost in validation grids (#1508).
EMA weights automatically saved at completion (#1519).

Changed / Improved

Defensive weight‑decay now on by default; optimiser composed accordingly (#1500).
Validation pipeline hard‑clamps dataset count (#1496) and skips duplicate IDs in multi‑GPU runs (#1524).
Flux: attention masks account for batch dimension; additional safeguards for outdated cache (#1502).
Statetracker no longer purges valid caches (#1503).
Dependency bumps for the “apple” extra (Metal builds) (#1510).
General repo tidy‑up, import moves, formatting and lock‑file rationalisation (#1492, #1493, #1496, #1527).

Fixed

HiDream LoRA‑target deserialisation (#1494).
Missing declaration in trainer init (#1506).
Removed rogue property setter call (#1514).
VAE scale left undefined for Flux VAE (#1509).
Patch re‑applied cleanly after erroneous layout change (revert #1518, correct #1519).

Documentation

README notes for QKV fusion and FA‑3 (#1522).
New docs: documentation/data_backends/huggingface.md, updated dataloader guide and distributed‑training notes (multiple commits in #1526).

Pull Requests

clean up junk by @bghira in #1492
post-release fixes by @bghira in #1493
hidream: fix loading lora targets by @bghira in #1494
merge by @bghira in #1495
validation: clamp to number of enabled eval datasets by @bghira in #1496
merge by @bghira in #1498
Update pyproject.toml for Blackwell GPU support by @nitinh12 in #1499
statetracker is deleting caches unnecessarily by @bghira in #1503
flux: attention mask should take into consideration the batch size by @bghira in #1502
apply defensive weight decay strategy by default by @bghira in #1500
merge by @bghira in #1504
fix missing declaration error by @bghira in #1506
merge by @bghira in #1507
add missing vae scale for flux vae by @bghira in #1509
add missing aws compression config value by @bghira in #1505
apple: dep updates by @bghira in #1510
conditioning image stitching by @bghira in #1508
merge by @bghira in #1511
fix layout of validation images by @bghira in #1512
(#1513) remove code that tries to set a property without a setter by @bghira in #1514
merge by @bghira in #1515
Revert "fix layout of validation images" by @bghira in #1518
fix layout, add EMA at the end by @bghira in #1519
merge by @bghira in #1520
flux: add option for flash attn 3 based fused qkv training by @bghira in #1521
add note about qkv fusion to readme by @bghira in #1522
[multigpu] when retrieving validation set, avoid returning the same image multiple times by @bghira in #1524
fuse/unfuse qkv projections on save/load by @bghira in #1525
adjust formatting by @bghira in #1527
add experimental huggingface datasets support by @bghira in #1526
merge by @bghira in #1529

New Contributors

@nitinh12 made their first contribution in #1499

Full Changelog: v2.0...v2.0.1

Contributors

bghira and nitinh12

Assets 2

26 Jun 05:54

bghira

v2.0

35971ab

v2.0 - Kontext [dev], ControlNet Everywhere

🎉 Highlights

SimpleTuner v2.0 is a pretty large overhaul of the modeling framework to support more flexibility, and even supports the creation of custom distillation methods with relatively few changes required. For this flexibility, some backwards compatibility breakage can be expected.

Before upgrading to this release, you should ensure you've completed all of your training runs and no longer wish to re-use the cache files from the earlier versions. They are no longer compatible! Your caches have to be recreated!

Unified Model Abstraction: Introduction of a new ModelFoundation base class, streamlining support and consistency across all model backends (SDXL, SD3, PixArt, HiDream, Auraflow, etc.).
Broader Model Coverage: First‐in-class support for Stable Diffusion 3, PixArt Sigma, HiDream, Auraflow, and WAN.
ControlNet Everywhere: Full ControlNet integration—training, inference, and PEFT/LoRA adapter support—across SDXL, SD3, PixArt Sigma, HiDream, Auraflow, and Flux models. The HiDream implementation was built right here first, and can likely be improved and optimised.

✨ New Features

Model & Pipeline Support

Stable Diffusion 3 (SD3): Added some missing features from upstream Diffusers
PixArt Sigma Implemented PixArt Sigma's research ControlNet code
Auraflow: Created a new ControlNet implementation, enabled the training of ControlNet LoRAs
HiDream: Full support for HiDream with custom ControlNet implementation and pipeline
Flux Kontext: Full support for Flux Kontext [dev] fine-tuning via full-rank (DeepSpeed), PEFT LoRA, and Lycoris LoKr with models that run seamlessly via Runware.ai

ControlNet & PEFT/LoRA

Nearly-universal ControlNet:
- LoHa (SDXL) and PEFT LoRA (everything else) loader mixins for ControlNet layers
- ControlNet hooks throughout the model framework to allow creating pixel or latent based models

Distillation framework

A new distillation framework allows creation of custom plugins for distilling models using LoRA/Lycoris adapters
Checkpoint Hooks: Entry points for saving/loading distillation checkpoints, eg. teacher/student and optimiser weights

⚙️ Enhancements

Performance & Quantization

Quantisation Workarounds: Updated helpers/training/quantisation/quanto_workarounds.py to integrate new backend fixes and TorchAO int8 training workarounds.
VAE Caching Improvements:
- Meta-device tensor fixes when clearing cache at epoch boundaries.
- On-demand caching for ControlNet conditioning inputs.

Data Loading

Support for Kontext / OmniGen style "edit" dataset configuration where target and source image pairs can be provided for GPT4o style instruct training

Validation

Now attempts to better pick safe defaults for video models like LTX and Wan
More robust handling of input validation image samples
New “edit image caption” style checks.

CLI & Quickstart

Config Resolution: Configure script now generates a generic default resolution config.
PyTorch 2.7 & ROCm: Upgraded support for PyTorch 2.7 on macOS and ROCm platforms.

🐞 Bug Fixes

Multi‐Process Caching Races: Added advisory locks and barrier fixes to prevent corruption under heavy parallel loads.
Device Mismatch Fixes: Enforced proper offload/move logic for text encoders, VAE cache clears, and Flux Kontext metadata paths.
ControlNet Argument & Tokenizer Handling:
- Corrected pretrained model name/path flags.
- Ensured pipeline tokenizers are set to None when not reused.
Video & JPEG XL Handling: Fixed video file discovery after JPEG XL additions and cropping metadata logging for video samples.

📝 Documentation

Readme Updates:
- Added notes on Kontext, ControlNet, and LoRA/PEFT workflows.
- Expanded HiDream usage examples.
Quickstart Guides: New markdown docs under documentation/quickstart/FLUX_KONTEXT.md.
Code Examples: Inline docstring and example updates for the new ModelFoundation API.

What's Changed

Support --lr-scale-sqrt and tidy up associated output by @clayne in #1366
SimpleTuner v2.0 (WIP) by @bghira in #1359
update v2 logic for ltxvideo by @bghira in #1375
add discord invite back by @bghira in #1376
remove faulty reference to shape on PIL image by @bghira in #1377
return transform when caching via image foundation by @bghira in #1378
update omnigen config example by @bghira in #1379
train.sh: activate venv automatically by @clayne in #1373
update lock file for release branch by @bghira in #1385
HiDream I1 training support by @bghira in #1380
hidream configure script enablement by @bghira in #1387
Fix HiDream training for train_batch_size > 1 by @mhirki in #1388
Fix parent-student training broken by v2 refactor by @mhirki in #1390
Auraflow training support by @bghira in #1389
add missing seq len by @bghira in #1392
(#1393) enable flux_lora_target for v2 lora loading code in flux by @bghira in #1394
SD1 ControlNet training fixes by @bghira in #1395
SDXL ControlNet fixes by @bghira in #1396
split_buckets_between_processes: Fix issue with imbalanced steps between processes by @clayne in #1391
HiDream: fix pipeline load so that it uses cached value, unlike other models by @bghira in #1397
fix sana gradient checkpointing code for intervals by @bghira in #1398
remove diffusers utils logspam by @bghira in #1400
fix OOM for vae caching on ltxvideo and wan video on 24g cards by @bghira in #1401
ltxvideo tokeniser shoudl use AutoTokenizer to avoid class mismatches by @bghira in #1402
wan: fix print of model name at startup by @bghira in #1403
Update to latest Diffusers main by @bghira in #1404
Added check for UniPCMultistepScheduler by @ShuyUSTC in #1407
Fix publishing model card for None caption dropout amount by @bghira in #1409
fix setting default seq len for sd3 by @bghira in #1410
update diffusers to latest git main by @bghira in #1411
Rename FeedForwardSwiGLU back to FeedForward for convenience by @mhirki in #1412
fix NoneType error when all images list hits race condition during file save/load by @bghira in #1414
Add support for JPEG-XL (file extension: jxl) by @StableLlama in #1413
fix unet/transformer subfolder options by @bghira in #1419
When running sample transforms, the dataset_type should be considered so that we do not run video transforms on image by @bghira in #1420
poetry: disable virtualenvs creation in local config by @bghira in #1421
make imports more backwards-compatible by @bghira in #1422
redirect output from GPU-detection routines to /dev/null more correctly by @bghira in #1423
Fix input perturbation and noise offset broken by v2 refactor by @mhirki in #1425
Update HiDream documentation with latest findings and better settings by @mhirki in #1426
hidream multigpu fixes; PEFT LoRA support by @bghira in #1428
disable offload at startup by default, allow opt-in via --offload_during_startup by @bghira in #1429
hidream: remove 2x vae scaling by @bghira in #1430
when encoding text embeds, force the move to accelerator on round one by @bghira in #1431
update apple dependencies by @bghira in #1432
vae cache clear at epoch flip has meta tensor error by @bghira in #1433
Reintroduce JPEG XL option and fix regression by @StableLlama in #1435
Use advisory locking while accessing cached state by @clayne in #1444
HiDream doc updates by @bghira in #1451
add flux conversion script by @bghira in #1452
cleanup warnings by @bghira in #1453
v...

Contributors

sjuxax, clayne, and 4 other contributors

Assets 2

07 Apr 22:57

bghira

v1.3.2

76cf875

v1.3.2 - maintenance release

What's Changed

Adds CFG-Zero* by default for SD 3.5 pipeline
Fixed start/stop declaration for SD 3.5 skip layer guidance
Reduced logspam, other minor bugfixes (see below)

Pull requests

resume fix for prodigy optim on new HF Accelerate by @bghira in #1353
remove more info logs by @bghira in #1354
add cfg-zero for sd3 by @bghira in #1355
remove log spam for validation check by @bghira in #1356
add cfgzero for sd3 by @bghira in #1357
Remove duplicate prompt by @clayne in #1370
train.sh: Respect HF_HOME by @clayne in #1371
Use correct HF token and WanDB api-key env-vars by @clayne in #1372
merge by @bghira in #1374

Full Changelog: v1.3.1...v1.3.2

Contributors

clayne and bghira

Assets 2

24 Mar 04:06

bghira

v1.3.1

2c8edeb

v1.3.1 - wan wuz here

🚀 New Features

Added disable_validation option
Added initial Wan 2.1 T2V training
Added MPS-compatible Wan 2.1 modeling support
Added skip layer guidance (SLG) for Wan 2.1 transformer modeling (experimental)
Added Wan 2.1 LoRA save hook
Added single-file safetensors load support for Wan models
Added sageattention support for benchmarking
Added support for newer Diffusers gradient checkpointing logic
Enabled FP8 support via torchao on RTX 4090 (Pytorch 2.6)

🐞 Bugfixes

Fixed dataset URL
Fixed NVIDIA training compatibility for ltxvideo (previously Apple-only)
Fixed non-ltx model VAE caching issues
Fixed default LyCORIS values not found (#1347)
Fixed validation for command-line arguments
Fixed default SLG parameters and dtype issues
Fixed references to start/end fractions in SD3 SLG logic (they were swapped)
Fixed parquet backend robust data-type handling
Fixed CLIP evaluation on images

🧹 Improvements

Reduced logging spam for VAE caching
Reduced print spam from Wan pipeline scheduling
Improved validation resolution handling (multiple of 16)
Improved hardware utilization notes (especially for 1.3B models)
Disabled unconditional generation for Wan and LTX models for better performance
Updated accelerate and torchao dependencies
Cleaned up log outputs and disabled monkey-patching for older versions of external libraries

📚 Documentation Updates

Updated and expanded LTX video guide (resolution, memory usage, data loader)
Updated Wan 2.1 T2V Quickstart Guide, including complete config example
Added sageattention benchmarking section to Wan documentation
Clarified VAE slicing and tiling support documentation
Added explicit memory notes (14B model fitting in 24GB)

What's Changed

nvidia training fix for ltxvideo (oops, tested on apple only before) by @bghira in #1343
Update LTX documentation & fix some bugs (SDXL) by @bghira in #1344
Add an option to completely disable validation by @Slickytail in #1336
Fix parsing of example lycoris config by @bghira in #1349
Wan 2.1 T2V training by @bghira in #1348
update docs by @bghira in #1350

New Contributors

@Slickytail made their first contribution in #1336

Full Changelog: v1.3.0...v1.3.1

Contributors

Slickytail and bghira

Assets 2

20 Mar 22:05

bghira

v1.3.0

0a67494

v1.3.0 - the video frontier

Features

LTX Video training. See the new quickstart for help!
- Use dataset_type=video, model_family=ltxvideo and Lightricks/LTX-Video for model path, and you're good to go!
- Dataset is just folder of MP4 or other video files
- By default, we truncate to 5 second length
Single file loading. No longer needs a Huggingface Hub or Diffusers style layout to load your weights.
Updated dependencies.

Pull requests

fix SDXL time ids by @bghira in #1327
merge by @bghira in #1328
update dependency pins by @bghira in #1329
update apple dependencies by @bghira in #1331
experimental: add single-file loading support for sdxl, flux, sd3 by @bghira in #1334
merge by @bghira in #1335
LTX Video training support by @bghira in #1341
merge by @bghira in #1342

Full Changelog: v1.2.5...v1.3.0

Contributors

bghira

Assets 2

10 Feb 23:16

bghira

v1.2.5

d45835e

v1.2.5

Features

New dataset_type=eval allows setting up your own eval splits for stable validation loss calculations
Prodigy allows use of LR schedulers now
Minimum and maximum aspect ratio options added to dataloader

Bugfixes

Fix for S3 data backend + compressed datasets
VAE cache delete on epoch end will no longer crash training from delete failures (MultiGPU)
Regularisation datasets will no longer break Lycoris training on resume

Pull requests

s3 backend fix for reading compressed cache by @bghira in #1297
vae cache refresh during training should not treat delete failures as errors by @bghira in #1298
update lycoris defaults, fix regularised training resume by @bghira in #1300
prodigy may work best with cosine scheduler over long training runs by @bghira in #1302
(#1113) add minimum and maximum aspect ratio bucket parameter and associated tests by @bghira in #1303
add stable loss calculation and tracking by @bghira in #1304

Full Changelog: v1.2.4...v1.2.5

Contributors

bghira

Assets 2

22 Jan 23:33

bghira

v1.2.4

46708c1

v1.2.4 - The Prodigal child returns

Stable Diffusion 3.5 Medium fine-tuned on v1.2.4

Features

Ignore final epochs for changes in dataloader length

Use --ignore_final_epochs=true to disable tracking of epochs so that your max train steps value is reached.
- This is helpful if you remove or add substantial amount of data from your training set.
- Remember to use --max_train_steps instead of --num_epochs when using this option.

New experimental Prodigy optimiser

Thanks to @LoganBooker we now have a new implementation of Prodigy that supports stochastic rounding and other features needed to reintroduce support.
- You may want to adjust --optimizer_config=d_coef=1 to a lower value to make the ramp-up and max LR lower.
- Changing LR is not currently tested/supported.

Bugfixes

Image preprocessing

VAE cache elements were being cropped too far for square input images that were larger than the target resolution.
- Example: An input of 1024x1024 with resolution of 512 and resolution_type=pixel_area or =area would be overly cropped straight from 1024px to 512px
- Not impacted: An input of 1024x1024 with a resolution of 1024 and any resolution_type worked as intended
- Not impacted: An input of 1024px and resolution of 512 with resolution_type=pixel

You'll want to recreate VAE caches and dataset metadata for this bugfix.

To remove the metadata, find and delete the *.json files from your image directories.

Sana

Fix modeling code after PEFT LoRA addition broke compatibility

AMD ROCm

Update list of BNB optimisers and another minor fix for MI300+ users

Validations

Fix that the validations were not running for the model final export at the end of training

What's Changed

instanceprompt strategy fix for caption discovery by @bghira in #1271
sana: fix modeling code reference to attention_kwargs by @bghira in #1273
Small fixes for running on AMD GPUs by @rkarhila-amd in #1276
add a special case for square input images where we need to resize to the target as intermediary, which can be considered a safe operation by @bghira in #1281
catch and handle wandb error when it is disabled by @bghira in #1282
add debug logging for factory initialisation in multigpu systems where it seems to get stuck, and format some files by @bghira in #1283
add ignore_final_epochs to workaround epoch tracking oddness when changing dataloader length by @bghira in #1285
fix divide by zero when reducing dataloader length by @bghira in #1286
update options doc for --ignore_final_epochs by @bghira in #1287
fix check for running the final validations by @bghira in #1290
Fixing broken python-tests action by @diodotosml in #1293
add prodigy optimiser with full bf16 support by @bghira in #1294
update docs for optimizer args by @bghira in #1295

New Contributors

@rkarhila-amd made their first contribution in #1276
@diodotosml made their first contribution in #1293

Full Changelog: v1.2.3...v1.2.4

Contributors

LoganBooker, bghira, and 2 other contributors

Assets 2

Releases: bghira/SimpleTuner

v2.1.2 - Hugging Face backend speedup, QoL, and Bugfixes

Pull requests

Contributors

Uh oh!

v2.1.1 - SingLoRA + bugfixes

SimpleTuner Release Notes

New Features

Bug Fixes

Improvements

Technical Updates

What's Changed

New Contributors

Contributors

Uh oh!

v2.1 - crucial fixes and a lot of neat stuff

v2.1

✨ New Capabilities

WARNING

Pull requests

Contributors

Uh oh!

v2.0.1 - huggingface datasets, qkv fusion, flash attn 3 go brr

✨ Highlights

Pull Requests

New Contributors

Contributors

Uh oh!

v2.0 - Kontext [dev], ControlNet Everywhere

🎉 Highlights

✨ New Features

Model & Pipeline Support

ControlNet & PEFT/LoRA

Distillation framework

⚙️ Enhancements

Performance & Quantization

Data Loading

Validation

CLI & Quickstart

🐞 Bug Fixes

📝 Documentation

What's Changed

Contributors

Uh oh!

v1.3.2 - maintenance release

What's Changed

Pull requests

Contributors

Uh oh!

v1.3.1 - wan wuz here

🚀 New Features

🐞 Bugfixes

🧹 Improvements

📚 Documentation Updates

What's Changed

New Contributors

Contributors

Uh oh!

v1.3.0 - the video frontier

Features

Pull requests

Contributors

Uh oh!

v1.2.5

Features

Bugfixes

Pull requests

Contributors

Uh oh!

v1.2.4 - The Prodigal child returns

Features

Ignore final epochs for changes in dataloader length

New experimental Prodigy optimiser

Bugfixes

Image preprocessing

Sana

AMD ROCm

Validations

What's Changed

New Contributors

✨ Highlights