Skip to content

Conversation

BuffMcBigHuge
Copy link
Collaborator

@BuffMcBigHuge BuffMcBigHuge commented Aug 12, 2025

Overview

This pull request introduces comprehensive support for customizable schedulers and samplers in StreamDiffusion, enabling more flexible and efficient diffusion processes. It builds on the existing pipeline to allow users to specify schedulers (e.g., LCM) and samplers (e.g., normal) via configuration, improving generation quality, speed, and compatibility with advanced features like ControlNet and IPAdapter. Additional enhancements include better LoRA handling to resolve conflicts, TensorRT engine optimizations for robustness, a quiet mode for cleaner logging, and minor UI/dependency updates.

Note: Several schedulers (DDIMScheduler, DPMSolverSDEScheduler, etc) were tested and found to be incompatible with t_index_list and scalors/tensors with StreamDiffusion LCM. TCD was found compatible with expected performance requirements.

The implementation refactors the core wrapper and pipeline to integrate scheduler/sampler logic dynamically, supports Temporal Consistency Distillation (TCD) for ControlNet, and ensures backward compatibility with existing setups. This enables experimentation with different diffusion strategies without recompiling engines from scratch.

New Features

  • Scheduler and Sampler Integration:

    • Added configurable scheduler (default: LCM) and sampler (default: normal) support in the core pipeline and wrapper.
    • Users can now specify these via YAML configs (e.g., scheduler: lcm, sampler: normal), affecting timestep scaling and noise prediction.
    • Integrated into engine building and inference, with dynamic recalculation of timestep-dependent parameters (e.g., scalings for boundary conditions).
    • Supports LCM-LoRA modes and unified UNet exports for schedulers/samplers.
    • Currently supports LCM and TCD schedulers only
  • Enhanced LoRA Handling:

    • Introduced a hashed signature for LoRA sets in engine naming (e.g., --lora-{num}-{hash}), allowing multiple LoRAs without path conflicts or invalid filenames.
    • Removal of use_lcm_lora in favor of lora_dict while remaining backwards compatible.
    • Fixed LoRA and IPAdapter conflicts by adjusting engine setup and requirements, ensuring stable fusion during exports.
    • Added LoRA signature to engine paths for better caching and reproducibility.
  • ControlNet Temporal Consistency Distillation (TCD):

    • Implemented TCD support for ControlNet, improving temporal stability in video/streaming generations.
    • Integrated into pipeline updates for consistent application across timesteps.
  • Quiet Mode for Uvicorn:

    • Added --quiet flag to suppress INFO-level uvicorn logs (e.g., access logs), reducing noise during debugging and production runs.
    • Configurable via environment (QUIET=True) or CLI, with logger adjustments in the realtime-img2img demo.
  • TensorRT Inference Improvements:

    • Added input filtering in the infer method to only pass supported tensors to the engine, preventing binding errors from extras like text embeds or time IDs.
    • Enhanced engine manager with LoRA-aware path generation and optional params for IPAdapter scale/tokens.

Dependencies

  • Updated core libs: diffusers remains 0.35.0; transformers to 4.55.4; peft to 0.17.1; accelerate 1.10.0; huggingface_hub to 0.34.4.
  • No new deps; xformers conditional remains.
  • Ensure compatible with existing TensorRT exports (e.g., unet_unified_export.py, unet_ipadapter_export.py).

@BuffMcBigHuge BuffMcBigHuge changed the title Schedulers & Samplers feat: Schedulers & Samplers Aug 12, 2025
@BuffMcBigHuge BuffMcBigHuge marked this pull request as ready for review September 16, 2025 00:36
self.graph = None

def infer(self, feed_dict, stream, use_cuda_graph=False):
# Filter inputs to only those the engine actually exposes to avoid binding errors
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure about this

@BuffMcBigHuge BuffMcBigHuge changed the title feat: Schedulers & Samplers TCD Scheduler + LoRA IPAdapter SDXL Sep 16, 2025
diffusers==0.35.0
transformers==4.56.0
peft==0.18.0
transformers==4.55.4
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dep versions changed for Windows support.


# Create prefix (from wrapper.py lines 1005-1013)
prefix = f"{base_name}--lcm_lora-{use_lcm_lora}--tiny_vae-{use_tiny_vae}--min_batch-{min_batch_size}--max_batch-{max_batch_size}"
prefix = f"{base_name}--tiny_vae-{use_tiny_vae}--min_batch-{min_batch_size}--max_batch-{max_batch_size}"
Copy link
Collaborator Author

@BuffMcBigHuge BuffMcBigHuge Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause engines to rebuild - so it's easiest to remove lcm_lora-{use_lcm_lora}-- from any engines you've already built.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for the heads up on this

@BuffMcBigHuge BuffMcBigHuge mentioned this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants