Releases: bghira/SimpleTuner
v2.1.2 - Hugging Face backend speedup, QoL, and Bugfixes

Training Enhancements
- Cosmos PEFT LoRA Training: Added support for Cosmos PEFT LoRA training, expanding fine-tuning capabilities
- Model Family-Based Caching: Introduced
{model_family}
magic string for cache paths, allowing shared caches across different model types - Enhanced Configuration Tool: Completely revamped
configure.py
with an ncurses interface for easier configuration management and the ability to load existing configs
Testing & Development
- End-to-End Test Runner: New CLI tool for comprehensive end-to-end testing across different model configurations
- Example Configurations: Added complete example configurations for LoRA and Lycoris training with documentation
Performance Optimizations
- Multi-threaded Caption Loading: Significantly faster HuggingFace caption loading through parallel processing
- Improved VAE Caching: Better cache handling for HuggingFace datasets with automatic path resolution
- Multi-GPU Optimizations: Cleaned up logging and added barrier synchronization to prevent race conditions
Model Support
-
Torch 2.7.1: Updated NVIDIA and Apple systems to PyTorch 2.7.1
-
Lumina2 Fixes: Fixed text encoder loading and attention masking issues
-
PixArt Improvements: Resolved VAE initialization and transformer reference issues
-
HiDream Canvas Limits: Enforced 1024² canvas size limit for stability with aspect bucketing
-
Fixed Flux ControlNet regression with list-to-dict conversion
-
Resolved eval dataset splitting issues - eval splits are now properly skipped during training
-
Fixed collation of text encoder outputs for legacy prompt attention masks
-
Corrected instance prompt handling to support more familiar syntax
-
Fixed PixArt VAE assumption (f=8) when loading without VAE
-
Updated OPTIONS.md with comprehensive configuration options
-
Added README for example configurations with venv activation instructions
-
Improved Kontext documentation with validation split examples
-
Added config-local override support via
config.env
files -
Unified multi-GPU logging protections across the codebase
-
Cleaned up text embed cache and VAE cache logging
-
Removed unused code from prompt module
-
Updated
.gitignore
to exclude test runner states -
Test configurations updated: Kontext tests now run for 100 steps, with adjusted settings for SDXL and SANA
-
Weights & Biases integration disabled by default in example configurations
-
Environment variable configuration files now support local overrides
Pull requests
- update nvidia and apple systems to torch 2.7.1 by @bghira in #1626
- add cosmos PEFT LoRA training by @bghira in #1627
- add options list back to OPTIONS.md by @bghira in #1628
- merge by @bghira in #1629
- hidream needs a canvas size limit set at 1024**2 by @bghira in #1631
- merge by @bghira in #1632
- refactor maximum canvas size enforcement based on PR comment by @bghira in #1634
- hf: thread the loading of captions for substantial speedup. cache captions, load from disk on 2nd+ load by @bghira in #1635
- cleanup logging in multigpu setup for dataloader config, vae cache by @bghira in #1636
- use config-local config.env as override if it exists by @bghira in #1637
- huggingface backend improvements by @bghira in #1640
- do not split metadata buckets for eval datasets by @bghira in #1641
- skip eval splits when training by @bghira in #1642
- kontext docs example for val split by @bghira in #1643
- merge by @bghira in #1644
- allow use of magic string {model_family} in a cache path so that a shared config can be used more effectively by @bghira in #1646
- allow more familiar syntax, undocumented, to use instanceprompt directly by @bghira in #1647
- fix regression introduced by kontext latent handling where ControlNet latents are no longer extracted by @bghira in #1648
- remove unused bit of code and add more useful debug log line by @bghira in #1649
- modify how variables are handled in databackend config so that we can replace any top level value by @bghira in #1650
- use concrete gemma2 class to load Lumina2 by @bghira in #1651
- add example configurations for LoRA and Lycoris training by @bghira in #1652
- fix attention masking and text encoder loading in lumina2 by @bghira in #1653
- when pixart is loading without vae it must assume f=8 by @bghira in #1654
- fix collate of text encoder outputs for old style prompt_attention_mask by @bghira in #1655
- merge by @bghira in #1656
- add CLI for end-to-end test runner by @bghira in #1657
- update kontext test runtime to just 100 steps by @bghira in #1658
- fix test runner test skipping by adding new flag to differentiate between ending one test vs all tests by @bghira in #1659
- fix reference to transformer when exporting by @bghira in #1660
- disable weights and biases by default in the example configs by @bghira in #1661
- add test runner states to ignore file by @bghira in #1662
- update configure.py to have a ncurses interface, allow loading existing configs to modify by @bghira in #1663
- merge by @bghira in #1664
Full Changelog: v2.1.1...v2.1.2
v2.1.1 - SingLoRA + bugfixes
SimpleTuner Release Notes
New Features
SingLoRA
- Use
peft_lora_mode=singlora
withlora_type=standard
to get a boost in accuracy and reduced parameter count.
Model Support
- Added training support for Lumina2 models
- Added training support for Cosmos2 (2B/14B) models
Training Enhancements
- Introduced LCM (Latent Consistency Model) distillation
- Added DCM (Dual Consistency Model) distillation prototype
- Implemented scheduled Huber loss option for DDPM training (flow-matching, not yet)
- Added experimental FP8 mixed-precision training using torchao backend for
full
model training
Performance & Infrastructure
- Added HF XET support for faster model downloads
- Extended FastAPI service worker for better WebUI handling
- Improved EMA (Exponential Moving Average) implementation to avoid deprecated integration paths
Bug Fixes
Kontext Training
- Fixed LoRA loading bug when resuming Kontext LoRA training
- Resolved issues with multiple condition Kontext training
- Fixed batched input mutation issues
Validation & Pipeline
- Fixed DistributedDataParallel unwrap issues for validation pipeline
- Resolved NoneType errors during validation checks on multi-GPU setups
- Improved validation stage font selection and label consistency
General Fixes
- Fixed torch inductor issues with unwrapped models
- Resolved SDXL crashes when VAE is not loaded
- Fixed autocast deprecation warnings for SD3
- Corrected issues with newer accelerate versions handling None values
Improvements
Documentation & User Experience
- Added user privacy note to README
- Improved model card generation and saving
- Added functionality to save full training config into checkpoint directory
- Removed unnecessary SDXL VAE warnings
- Enhanced metadata formatting for optimizer configurations
Technical Updates
- Added to_qkv/add_qkv projection targets for LoRA config
- Improved conditioning data sorting for list-based inputs
- Updated dependency management for SingLoRA integration
What's Changed
- add to_qkv / add_qkv proj by @bghira in #1569
- do not unwrap when None because newer accelerate no longer gracefully handles this case by @bghira in #1571
- torch inductor was not working because of using unwrapped model by @bghira in #1572
- merge by @bghira in #1573
- add experimental fp8 mixed-precision option using torchao backend by @bghira in #1574
- formatting fix by @bghira in #1576
- kontext batched input mutation by @bghira in #1575
- merge by @bghira in #1577
- Lumina2 training support by @bghira in #1578
- cosmos2 2B/14B training support by @bghira in #1579
- add scheduled huber loss option for ddpm by @bghira in #1582
- simplify implementation of huber loss by @bghira in #1583
- LCM distillation by @bghira in #1584
- Add DCM distillation prototype by @bghira in #1459
- add note about user privacy being respected to the readme by @bghira in #1585
- when conditioning_data is a list, check inside it for the values to sort by by @bghira in #1589
- fix for kontext training with multiple conditions by @bghira in #1590
- merge by @bghira in #1591
- fix autocast deprecation warning for sd3 by @bghira in #1594
- metadata for optimiser config should have spacing by @bghira in #1595
- add hf xet to nvidia and apple systems for faster downloads by @bghira in #1596
- extend fastapi service worker to better handle webui by @bghira in #1597
- formatting by @bghira in #1598
- kontext: add vae cache dir to example config.json by @bghira in #1601
- merge by @bghira in #1602
- Fix the lora loading bug when resuming the Kontext LoRA training by @GuardSkill in #1600
- ema: dont use the deprecated integration path by @bghira in #1604
- fix default value for --ema_foreach_disable by @bghira in #1605
- fix nonetype error for validation check on multigpu by @bghira in #1606
- Fix DistributedDataParallel unwrap for validations pipeline by @bghira in #1607
- merge by @bghira in #1608
- validation refactor font selection & label consistency by @bghira in #1609
- merge by @bghira in #1610
- Repair bug in validation stage [update validation.py] by @GuardSkill in #1599
- Add SingLoRA support by @bghira in #1612
- sdxl: resolve crash when vae is not loaded by @bghira in #1613
- add singlora dependency by @bghira in #1614
- [nv] add singlora dependency by @bghira in #1615
- sdk: update ui helpers by @bghira in #1616
- singlora needs original_type passed in due to how parameters are grouped by @bghira in #1617
- model card improvements/fixes by @bghira in #1618
- remove SDXL VAE warning by @bghira in #1619
- save model card and full training config into checkpoint dir by @bghira in #1620
- disable SingLoRA ramp-up-steps by @bghira in #1623
- Fix crucial detail for multiple condition Kontext lora training by @GuardSkill in #1621
- fix tests by @bghira in #1624
- merge by @bghira in #1625
New Contributors
- @GuardSkill made their first contribution in #1600
Full Changelog: v2.1...v2.1.1
v2.1 - crucial fixes and a lot of neat stuff
v2.1
✨ New Capabilities
WARNING
- QKV projection behaviour has changed
- Kontext reference dataset captions will now override edit captions, so disable them if you want to avoid that

Area | What’s new |
---|---|
Flexible Conditioning | • Multi-image conditioning in the FLUX pipeline—blend several reference images in one generation. • Define multiple conditioning datasets and randomly select one for each step. |
Data & Augmentation | • Data Generator pipeline: one-command dataset preprocessing that parallelises I/O and heavy transforms. • New JPEG-Artifacts and Random-Masks sample generators for robustness and inpainting workflows. |
Video Validation | Side-by-side “stitching” previews let you compare each validation frame in a single composite video. |
Checkpoints & Weights | • safetensors-merge utility combines multiple .safetensors shards into one file. • QKV-fusion pipeline can now fuse attention projections via Flash Attention 3 or Torch's built-in SDPA for speed. |
Prompting | • instance_prompt now accepts lists → trainer will randomly select an instance prompt on each step. • Caption strategy should now be set to null or left undefined on your reference datasets for Flux Kontext! |
Storage Back-Ends | • S3 and other back-ends can now be re-created from a serialized string, making config files truly portable. |
Bug Fixing | • Wan 2.1 T2V training quality greatly improved, much more stable. • Auraflow training is now rock-solid and likeness can be trained in. |
Note: If you were using fused QKV projections in 2.0.1, you'll need to restart training from step zero because parameter counts will mismatch between the states.
Note: If your
conditioning
dataset hascaption_strategy
set, those captions will now be used instead of the edit captions.
Pull requests
- add cuda script for installing 12.8 tools on Ubuntu by @bghira in #1533
- update error message to point user to install doc for fa3 support by @bghira in #1534
- update discord link by @bghira in #1538
- Fix accelerator and segmentation by @burgalon in #1540
- Revert "Fix accelerator and segmentation" by @bghira in #1541
- catch error during decompression by @bghira in #1542
- fix unwrapping for DDP by @bghira in #1543
- automatically generate conditioning data by config by @bghira in #1539
- video stitching support for validations by @bghira in #1546
- fix order of validation outputs by @bghira in #1547
- wan 2.1: fix VAE encoding by @bghira in #1548
- EMA: allow qkv fusion without interference by @bghira in #1550
- Add support for multi-reference training (eg, with flux kontext) by @Slickytail in #1528
- add diffusers overrides by @bghira in #1552
- flux: fix sdpa fused attention processor, cannot use sparse tensors or explodes by @bghira in #1553
- multi-conditioning dataset support by @bghira in #1551
- S3 backend needs serialisation methods by @bghira in #1554
- updates for rocm systems by @bghira in #1555
- apple mps updates by @bghira in #1556
- updates for nvidia systems by @bghira in #1557
- Fix miscommunication in Kontext doc by @bghira in #1558
- Remove incorrect value from Kontext config example by @bghira in #1559
- omnigen v1: fix vae scaling, reduces loss value a lot by @bghira in #1560
- fix auto conditioning config load issue by @bghira in #1561
- auraflow: resolve failure to load lora on resume by @bghira in #1562
- fix masked loss application by @bghira in #1563
- auraflow: remove double unpatchify, fix attention masking, fix LoRA target declaration by @bghira in #1564
- update auraflow docs for astraliteheart recommendations by @bghira in #1399
- more robust disabling of validation routine when we have no prompts by @bghira in #1565
- sdk: fix typo by @bghira in #1566
- enable HiDream on apple MPS by using fp32 PE when it's present by @bghira in #1567
- merge by @bghira in #1568
Full Changelog: v2.0.1...v2.1
v2.0.1 - huggingface datasets, qkv fusion, flash attn 3 go brr
✨ Highlights
Area | What’s new & why it matters |
---|---|
GPU / Perf. | Blackwell‑class GPUs are now recognised by the wheel selector (pyproject update #1499).Flux ↔ Flash‑Attn 3 fused QKV – opt‑in training path that exploits FA‑3 kernels and automatically (un)fuses projections on save/load (#1521, #1525). |
Data pipeline | Experimental Hugging Face 🤗 Datasets backend (images + video) with on‑the‑fly metadata caching, split composition flexible configuration (#1526). |
Validation UX | Safer defaults: clamp to enabled datasets (#1496); fixed layout, duplicate‑sample guard for multi‑GPU, and final‑epoch EMA snapshot (#1512, #1518‑19, #1524). |
Training quality | Defensive‑weight‑decay strategy promoted to default; optimiser composed automatically (#1500). |
Flux model fixes | Batch‑aware attention masks, VAE‑scale parameter, robust handling of stale cached masks and incompatible inputs (#1502, #1509). |
HiDream | LoRA‑target loading restored (#1494). |
Added
- Flash‑Attn 3 fused QKV option (--flux.flash_fused_qkv) and helper utilities (#1521).
- Automatic fuse / unfuse on checkpoint save‑load, enabling seamless interchange with non‑fused models (#1525).
- 🤗 Datasets backend (image & video), CLI preset subjects200k, and extended docs/examples (#1526).
- AWS S3 compression config key (AWS_COMPRESSION) (#1505).
- Conditioning image stitching: conditioning frame is now shown leftmost in validation grids (#1508).
- EMA weights automatically saved at completion (#1519).
Changed / Improved
- Defensive weight‑decay now on by default; optimiser composed accordingly (#1500).
- Validation pipeline hard‑clamps dataset count (#1496) and skips duplicate IDs in multi‑GPU runs (#1524).
- Flux: attention masks account for batch dimension; additional safeguards for outdated cache (#1502).
- Statetracker no longer purges valid caches (#1503).
- Dependency bumps for the “apple” extra (Metal builds) (#1510).
- General repo tidy‑up, import moves, formatting and lock‑file rationalisation (#1492, #1493, #1496, #1527).
Fixed
- HiDream LoRA‑target deserialisation (#1494).
- Missing declaration in trainer init (#1506).
- Removed rogue property setter call (#1514).
- VAE scale left undefined for Flux VAE (#1509).
- Patch re‑applied cleanly after erroneous layout change (revert #1518, correct #1519).
Documentation
- README notes for QKV fusion and FA‑3 (#1522).
- New docs: documentation/data_backends/huggingface.md, updated dataloader guide and distributed‑training notes (multiple commits in #1526).
Pull Requests
- clean up junk by @bghira in #1492
- post-release fixes by @bghira in #1493
- hidream: fix loading lora targets by @bghira in #1494
- merge by @bghira in #1495
- validation: clamp to number of enabled eval datasets by @bghira in #1496
- merge by @bghira in #1498
- Update pyproject.toml for Blackwell GPU support by @nitinh12 in #1499
- statetracker is deleting caches unnecessarily by @bghira in #1503
- flux: attention mask should take into consideration the batch size by @bghira in #1502
- apply defensive weight decay strategy by default by @bghira in #1500
- merge by @bghira in #1504
- fix missing declaration error by @bghira in #1506
- merge by @bghira in #1507
- add missing vae scale for flux vae by @bghira in #1509
- add missing aws compression config value by @bghira in #1505
- apple: dep updates by @bghira in #1510
- conditioning image stitching by @bghira in #1508
- merge by @bghira in #1511
- fix layout of validation images by @bghira in #1512
- (#1513) remove code that tries to set a property without a setter by @bghira in #1514
- merge by @bghira in #1515
- Revert "fix layout of validation images" by @bghira in #1518
- fix layout, add EMA at the end by @bghira in #1519
- merge by @bghira in #1520
- flux: add option for flash attn 3 based fused qkv training by @bghira in #1521
- add note about qkv fusion to readme by @bghira in #1522
- [multigpu] when retrieving validation set, avoid returning the same image multiple times by @bghira in #1524
- fuse/unfuse qkv projections on save/load by @bghira in #1525
- adjust formatting by @bghira in #1527
- add experimental huggingface datasets support by @bghira in #1526
- merge by @bghira in #1529
New Contributors
Full Changelog: v2.0...v2.0.1
v2.0 - Kontext [dev], ControlNet Everywhere
🎉 Highlights
SimpleTuner v2.0 is a pretty large overhaul of the modeling framework to support more flexibility, and even supports the creation of custom distillation methods with relatively few changes required. For this flexibility, some backwards compatibility breakage can be expected.
Before upgrading to this release, you should ensure you've completed all of your training runs and no longer wish to re-use the cache files from the earlier versions. They are no longer compatible! Your caches have to be recreated!
- Unified Model Abstraction: Introduction of a new
ModelFoundation
base class, streamlining support and consistency across all model backends (SDXL, SD3, PixArt, HiDream, Auraflow, etc.). - Broader Model Coverage: First‐in-class support for Stable Diffusion 3, PixArt Sigma, HiDream, Auraflow, and WAN.
- ControlNet Everywhere: Full ControlNet integration—training, inference, and PEFT/LoRA adapter support—across SDXL, SD3, PixArt Sigma, HiDream, Auraflow, and Flux models. The HiDream implementation was built right here first, and can likely be improved and optimised.
✨ New Features
Model & Pipeline Support
- Stable Diffusion 3 (SD3): Added some missing features from upstream Diffusers
- PixArt Sigma Implemented PixArt Sigma's research ControlNet code
- Auraflow: Created a new ControlNet implementation, enabled the training of ControlNet LoRAs
- HiDream: Full support for HiDream with custom ControlNet implementation and pipeline
- Flux Kontext: Full support for Flux Kontext [dev] fine-tuning via full-rank (DeepSpeed), PEFT LoRA, and Lycoris LoKr with models that run seamlessly via Runware.ai
ControlNet & PEFT/LoRA
- Nearly-universal ControlNet:
- LoHa (SDXL) and PEFT LoRA (everything else) loader mixins for ControlNet layers
- ControlNet hooks throughout the model framework to allow creating pixel or latent based models
Distillation framework
- A new distillation framework allows creation of custom plugins for distilling models using LoRA/Lycoris adapters
- Checkpoint Hooks: Entry points for saving/loading distillation checkpoints, eg. teacher/student and optimiser weights
⚙️ Enhancements
Performance & Quantization
-
Quantisation Workarounds: Updated
helpers/training/quantisation/quanto_workarounds.py
to integrate new backend fixes and TorchAO int8 training workarounds. -
VAE Caching Improvements:
- Meta-device tensor fixes when clearing cache at epoch boundaries.
- On-demand caching for ControlNet conditioning inputs.
Data Loading
- Support for Kontext / OmniGen style "edit" dataset configuration where target and source image pairs can be provided for GPT4o style instruct training
Validation
- Now attempts to better pick safe defaults for video models like LTX and Wan
- More robust handling of input validation image samples
- New “edit image caption” style checks.
CLI & Quickstart
- Config Resolution: Configure script now generates a generic default resolution config.
- PyTorch 2.7 & ROCm: Upgraded support for PyTorch 2.7 on macOS and ROCm platforms.
🐞 Bug Fixes
-
Multi‐Process Caching Races: Added advisory locks and barrier fixes to prevent corruption under heavy parallel loads.
-
Device Mismatch Fixes: Enforced proper offload/move logic for text encoders, VAE cache clears, and Flux Kontext metadata paths.
-
ControlNet Argument & Tokenizer Handling:
- Corrected pretrained model name/path flags.
- Ensured pipeline tokenizers are set to
None
when not reused.
-
Video & JPEG XL Handling: Fixed video file discovery after JPEG XL additions and cropping metadata logging for video samples.
📝 Documentation
-
Readme Updates:
- Added notes on Kontext, ControlNet, and LoRA/PEFT workflows.
- Expanded HiDream usage examples.
-
Quickstart Guides: New markdown docs under
documentation/quickstart/FLUX_KONTEXT.md
. -
Code Examples: Inline docstring and example updates for the new
ModelFoundation
API.
What's Changed
- Support --lr-scale-sqrt and tidy up associated output by @clayne in #1366
- SimpleTuner v2.0 (WIP) by @bghira in #1359
- update v2 logic for ltxvideo by @bghira in #1375
- add discord invite back by @bghira in #1376
- remove faulty reference to shape on PIL image by @bghira in #1377
- return transform when caching via image foundation by @bghira in #1378
- update omnigen config example by @bghira in #1379
- train.sh: activate venv automatically by @clayne in #1373
- update lock file for release branch by @bghira in #1385
- HiDream I1 training support by @bghira in #1380
- hidream configure script enablement by @bghira in #1387
- Fix HiDream training for train_batch_size > 1 by @mhirki in #1388
- Fix parent-student training broken by v2 refactor by @mhirki in #1390
- Auraflow training support by @bghira in #1389
- add missing seq len by @bghira in #1392
- (#1393) enable flux_lora_target for v2 lora loading code in flux by @bghira in #1394
- SD1 ControlNet training fixes by @bghira in #1395
- SDXL ControlNet fixes by @bghira in #1396
- split_buckets_between_processes: Fix issue with imbalanced steps between processes by @clayne in #1391
- HiDream: fix pipeline load so that it uses cached value, unlike other models by @bghira in #1397
- fix sana gradient checkpointing code for intervals by @bghira in #1398
- remove diffusers utils logspam by @bghira in #1400
- fix OOM for vae caching on ltxvideo and wan video on 24g cards by @bghira in #1401
- ltxvideo tokeniser shoudl use AutoTokenizer to avoid class mismatches by @bghira in #1402
- wan: fix print of model name at startup by @bghira in #1403
- Update to latest Diffusers main by @bghira in #1404
- Added check for UniPCMultistepScheduler by @ShuyUSTC in #1407
- Fix publishing model card for None caption dropout amount by @bghira in #1409
- fix setting default seq len for sd3 by @bghira in #1410
- update diffusers to latest git main by @bghira in #1411
- Rename FeedForwardSwiGLU back to FeedForward for convenience by @mhirki in #1412
- fix NoneType error when all images list hits race condition during file save/load by @bghira in #1414
- Add support for JPEG-XL (file extension: jxl) by @StableLlama in #1413
- fix unet/transformer subfolder options by @bghira in #1419
- When running sample transforms, the dataset_type should be considered so that we do not run video transforms on image by @bghira in #1420
- poetry: disable virtualenvs creation in local config by @bghira in #1421
- make imports more backwards-compatible by @bghira in #1422
- redirect output from GPU-detection routines to /dev/null more correctly by @bghira in #1423
- Fix input perturbation and noise offset broken by v2 refactor by @mhirki in #1425
- Update HiDream documentation with latest findings and better settings by @mhirki in #1426
- hidream multigpu fixes; PEFT LoRA support by @bghira in #1428
- disable offload at startup by default, allow opt-in via --offload_during_startup by @bghira in #1429
- hidream: remove 2x vae scaling by @bghira in #1430
- when encoding text embeds, force the move to accelerator on round one by @bghira in #1431
- update apple dependencies by @bghira in #1432
- vae cache clear at epoch flip has meta tensor error by @bghira in #1433
- Reintroduce JPEG XL option and fix regression by @StableLlama in #1435
- Use advisory locking while accessing cached state by @clayne in #1444
- HiDream doc updates by @bghira in #1451
- add flux conversion script by @bghira in #1452
- cleanup warnings by @bghira in #1453
- v...
v1.3.2 - maintenance release
What's Changed
- Adds CFG-Zero* by default for SD 3.5 pipeline
- Fixed start/stop declaration for SD 3.5 skip layer guidance
- Reduced logspam, other minor bugfixes (see below)
Pull requests
- resume fix for prodigy optim on new HF Accelerate by @bghira in #1353
- remove more info logs by @bghira in #1354
- add cfg-zero for sd3 by @bghira in #1355
- remove log spam for validation check by @bghira in #1356
- add cfgzero for sd3 by @bghira in #1357
- Remove duplicate prompt by @clayne in #1370
- train.sh: Respect HF_HOME by @clayne in #1371
- Use correct HF token and WanDB api-key env-vars by @clayne in #1372
- merge by @bghira in #1374
Full Changelog: v1.3.1...v1.3.2
v1.3.1 - wan wuz here
🚀 New Features
- Added
disable_validation
option - Added initial Wan 2.1 T2V training
- Added MPS-compatible Wan 2.1 modeling support
- Added skip layer guidance (SLG) for Wan 2.1 transformer modeling (experimental)
- Added Wan 2.1 LoRA save hook
- Added single-file safetensors load support for Wan models
- Added sageattention support for benchmarking
- Added support for newer Diffusers gradient checkpointing logic
- Enabled FP8 support via torchao on RTX 4090 (Pytorch 2.6)
🐞 Bugfixes
- Fixed dataset URL
- Fixed NVIDIA training compatibility for ltxvideo (previously Apple-only)
- Fixed non-ltx model VAE caching issues
- Fixed default LyCORIS values not found (#1347)
- Fixed validation for command-line arguments
- Fixed default SLG parameters and dtype issues
- Fixed references to start/end fractions in SD3 SLG logic (they were swapped)
- Fixed parquet backend robust data-type handling
- Fixed CLIP evaluation on images
🧹 Improvements
- Reduced logging spam for VAE caching
- Reduced print spam from Wan pipeline scheduling
- Improved validation resolution handling (multiple of 16)
- Improved hardware utilization notes (especially for 1.3B models)
- Disabled unconditional generation for Wan and LTX models for better performance
- Updated accelerate and torchao dependencies
- Cleaned up log outputs and disabled monkey-patching for older versions of external libraries
📚 Documentation Updates
- Updated and expanded LTX video guide (resolution, memory usage, data loader)
- Updated Wan 2.1 T2V Quickstart Guide, including complete config example
- Added sageattention benchmarking section to Wan documentation
- Clarified VAE slicing and tiling support documentation
- Added explicit memory notes (14B model fitting in 24GB)
What's Changed
- nvidia training fix for ltxvideo (oops, tested on apple only before) by @bghira in #1343
- Update LTX documentation & fix some bugs (SDXL) by @bghira in #1344
- Add an option to completely disable validation by @Slickytail in #1336
- Fix parsing of example lycoris config by @bghira in #1349
- Wan 2.1 T2V training by @bghira in #1348
- update docs by @bghira in #1350
New Contributors
- @Slickytail made their first contribution in #1336
Full Changelog: v1.3.0...v1.3.1
v1.3.0 - the video frontier
Features
- LTX Video training. See the new quickstart for help!
- Use
dataset_type=video
,model_family=ltxvideo
andLightricks/LTX-Video
for model path, and you're good to go! - Dataset is just folder of MP4 or other video files
- By default, we truncate to 5 second length
- Use
- Single file loading. No longer needs a Huggingface Hub or Diffusers style layout to load your weights.
- Updated dependencies.
Pull requests
- fix SDXL time ids by @bghira in #1327
- merge by @bghira in #1328
- update dependency pins by @bghira in #1329
- update apple dependencies by @bghira in #1331
- experimental: add single-file loading support for sdxl, flux, sd3 by @bghira in #1334
- merge by @bghira in #1335
- LTX Video training support by @bghira in #1341
- merge by @bghira in #1342
Full Changelog: v1.2.5...v1.3.0
v1.2.5
Features
- New
dataset_type=eval
allows setting up your own eval splits for stable validation loss calculations - Prodigy allows use of LR schedulers now
- Minimum and maximum aspect ratio options added to dataloader
Bugfixes
- Fix for S3 data backend + compressed datasets
- VAE cache delete on epoch end will no longer crash training from delete failures (MultiGPU)
- Regularisation datasets will no longer break Lycoris training on resume
Pull requests
- s3 backend fix for reading compressed cache by @bghira in #1297
- vae cache refresh during training should not treat delete failures as errors by @bghira in #1298
- update lycoris defaults, fix regularised training resume by @bghira in #1300
- prodigy may work best with cosine scheduler over long training runs by @bghira in #1302
- (#1113) add minimum and maximum aspect ratio bucket parameter and associated tests by @bghira in #1303
- add stable loss calculation and tracking by @bghira in #1304
Full Changelog: v1.2.4...v1.2.5
v1.2.4 - The Prodigal child returns
Stable Diffusion 3.5 Medium fine-tuned on v1.2.4
Features
Ignore final epochs for changes in dataloader length
- Use
--ignore_final_epochs=true
to disable tracking of epochs so that your max train steps value is reached.- This is helpful if you remove or add substantial amount of data from your training set.
- Remember to use
--max_train_steps
instead of--num_epochs
when using this option.
New experimental Prodigy optimiser
- Thanks to @LoganBooker we now have a new implementation of Prodigy that supports stochastic rounding and other features needed to reintroduce support.
- You may want to adjust
--optimizer_config=d_coef=1
to a lower value to make the ramp-up and max LR lower. - Changing LR is not currently tested/supported.
- You may want to adjust
Bugfixes
Image preprocessing
- VAE cache elements were being cropped too far for square input images that were larger than the target resolution.
- Example: An input of 1024x1024 with resolution of 512 and resolution_type=pixel_area or =area would be overly cropped straight from 1024px to 512px
- Not impacted: An input of 1024x1024 with a resolution of 1024 and any resolution_type worked as intended
- Not impacted: An input of 1024px and resolution of 512 with resolution_type=pixel
You'll want to recreate VAE caches and dataset metadata for this bugfix.
To remove the metadata, find and delete the *.json
files from your image directories.
Sana
- Fix modeling code after PEFT LoRA addition broke compatibility
AMD ROCm
- Update list of BNB optimisers and another minor fix for MI300+ users
Validations
- Fix that the validations were not running for the model final export at the end of training
What's Changed
- instanceprompt strategy fix for caption discovery by @bghira in #1271
- sana: fix modeling code reference to attention_kwargs by @bghira in #1273
- Small fixes for running on AMD GPUs by @rkarhila-amd in #1276
- add a special case for square input images where we need to resize to the target as intermediary, which can be considered a safe operation by @bghira in #1281
- catch and handle wandb error when it is disabled by @bghira in #1282
- add debug logging for factory initialisation in multigpu systems where it seems to get stuck, and format some files by @bghira in #1283
- add ignore_final_epochs to workaround epoch tracking oddness when changing dataloader length by @bghira in #1285
- fix divide by zero when reducing dataloader length by @bghira in #1286
- update options doc for --ignore_final_epochs by @bghira in #1287
- fix check for running the final validations by @bghira in #1290
- Fixing broken python-tests action by @diodotosml in #1293
- add prodigy optimiser with full bf16 support by @bghira in #1294
- update docs for optimizer args by @bghira in #1295
New Contributors
- @rkarhila-amd made their first contribution in #1276
- @diodotosml made their first contribution in #1293
Full Changelog: v1.2.3...v1.2.4