Skip to content

Add StableDiffusion3InstructPix2PixPipeline #11378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 30, 2025

Conversation

xduzhangjiayu
Copy link
Contributor

What does this PR do?

Add StableDiffusion3InstructPix2PixPipeline
Would you like give a review? Many thanks~ @yiyixuxu @asomoza

@asomoza
Copy link
Member

asomoza commented Apr 22, 2025

Hi and thanks for your contribution. Is there a model for this pipeline so I can test it?

@xduzhangjiayu
Copy link
Contributor Author

xduzhangjiayu commented Apr 22, 2025

Hi, thanks for the reply!
You can use model trained by myself from https://huggingface.co/CaptainZZZ/sd3-instructpix2pix/tree/main, you only need to replace the original transformer from official SD3. I have already tested the result and the result is reasonable, Or, for better performance, you can refer to another powerful model from https://huggingface.co/BleachNick/SD3_UltraEdit_freeform/tree/main/transformer
@asomoza

@asomoza
Copy link
Member

asomoza commented Apr 23, 2025

I did a test but I get a bad result:

import torch

from diffusers.pipelines.stable_diffusion_3.pipeline_stable_diffusion_3_instruct_pix2pix import (
    StableDiffusion3InstructPix2PixPipeline,
)
from diffusers.utils import load_image


resolution = 1024
image = load_image("https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png").resize(
    (resolution, resolution)
)
edit_instruction = "Turn sky into a cloudy one"

pipe = StableDiffusion3InstructPix2PixPipeline.from_pretrained(
    "BleachNick/SD3_UltraEdit_freeform", torch_dtype=torch.float16
)

pipe.enable_model_cpu_offload()

edited_image = pipe(
    prompt=edit_instruction,
    image=image,
    height=resolution,
    width=resolution,
    guidance_scale=7.5,
    image_guidance_scale=1.5,
    num_inference_steps=30,
).images[0]

edited_image.save("edited_image.png")

and I get this image:

edited_image

I also tried with changing the transformer model with yours but got the same result.

I don't have the time to look into this right now, can you solve the issue? Also ideally if this works, you will need to add the corresponding doc page and to be able to load it like the example in the dosctring from diffusers import StableDiffusion3InstructPix2PixPipeline you will need to add it to the __init__.py of the pipelines and the main diffusers one.

But still, the priority here should be to make it work and to demo an example with it.

@xduzhangjiayu
Copy link
Contributor Author

xduzhangjiayu commented Apr 24, 2025

Hi @asomoza
Sorry I didn't mention before that the model was trained on 512×512 images, so it is better to use 512×512 as the input.
I changed the following code and using my transformer model
resolution = 512 image = load_image("https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png").resize( (resolution, resolution) ) edit_instruction = "Turn sky into a sunny one"
And I got the image below:

edited_image

And could you please tell me where I should add the doc page? Thanks~

@asomoza
Copy link
Member

asomoza commented Apr 24, 2025

@xduzhangjiayu the code I used was the one that's in the docstring of the pipeline, so probably it's better to change it there.

And could you please tell me where I should add the doc page?

For the docs, it's inside the docs/source/en, you can learn from other implementations like this one .

But looking at the quality of the model and the work involved, I would recommend to move this pipeline to the community examples, it would be easier to do and also we can move it to core if it gets popular later, this way you don't have to write docs and we can merge it faster.

@xduzhangjiayu
Copy link
Contributor Author

Hi @asomoza
I agree, and I already moved the pipeline to community, what else do I need to write for this PR? Thanks~

@asomoza
Copy link
Member

asomoza commented Apr 29, 2025

@xduzhangjiayu thanks, can you please add a small description and a functional snipped of code to run it in the README file.

@xduzhangjiayu
Copy link
Contributor Author

Hi @asomoza
Done! Please check~

@asomoza
Copy link
Member

asomoza commented Apr 29, 2025

we don't use the same directory for hosting images, I took the liberty to upload your images to the hub here:

https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/mountain.png
https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/edited.png

can you delete the images from this PR and use these links instead please?

@xduzhangjiayu
Copy link
Contributor Author

@asomoza Done

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@asomoza asomoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, just some minor suggestions

@@ -86,6 +86,7 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar
| Perturbed-Attention Guidance |StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG).|[Perturbed-Attention Guidance](#perturbed-attention-guidance)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/perturbed_attention_guidance.ipynb)|[Hyoungwon Cho](https://github.com/HyoungwonCho)|
| CogVideoX DDIM Inversion Pipeline | Implementation of DDIM inversion and guided attention-based editing denoising process on CogVideoX. | [CogVideoX DDIM Inversion Pipeline](#cogvideox-ddim-inversion-pipeline) | - | [LittleNyima](https://github.com/LittleNyima) |
| FaithDiff Stable Diffusion XL Pipeline | Implementation of [(CVPR 2025) FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolutionUnleashing Diffusion Priors for Faithful Image Super-resolution](https://arxiv.org/abs/2411.18824) - FaithDiff is a faithful image super-resolution method that leverages latent diffusion models by actively adapting the diffusion prior and jointly fine-tuning its components (encoder and diffusion model) with an alignment module to ensure high fidelity and structural consistency. | [FaithDiff Stable Diffusion XL Pipeline](#faithdiff-stable-diffusion-xl-pipeline) | [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/jychen9811/FaithDiff) | [Junyang Chen, Jinshan Pan, Jiangxin Dong, IMAG Lab, (Adapted by Eliseu Silva)](https://github.com/JyChen9811/FaithDiff) |
| Stable Diffusion 3 InstructPix2Pix Pipeline | Implementation of Stable Diffusion 3 InstructPix2Pix Pipeline | [Stable Diffusion 3 InstructPix2Pix Pipeline](#stable-diffusion-3-instructpix2pix-pipeline) | [![Hugging Face Models]()](https://huggingface.co/BleachNick/SD3_UltraEdit_freeform) [![Hugging Face Models]()](https://huggingface.co/CaptainZZZ/sd3-instructpix2pix) | [Jiayu Zhang](https://github.com/xduzhangjiayu) and [Haozhe Zhao](https://github.com/HaozheZhao)|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Stable Diffusion 3 InstructPix2Pix Pipeline | Implementation of Stable Diffusion 3 InstructPix2Pix Pipeline | [Stable Diffusion 3 InstructPix2Pix Pipeline](#stable-diffusion-3-instructpix2pix-pipeline) | [![Hugging Face Models]()](https://huggingface.co/BleachNick/SD3_UltraEdit_freeform) [![Hugging Face Models]()](https://huggingface.co/CaptainZZZ/sd3-instructpix2pix) | [Jiayu Zhang](https://github.com/xduzhangjiayu) and [Haozhe Zhao](https://github.com/HaozheZhao)|
| Stable Diffusion 3 InstructPix2Pix Pipeline | Implementation of Stable Diffusion 3 InstructPix2Pix Pipeline | [Stable Diffusion 3 InstructPix2Pix Pipeline](#stable-diffusion-3-instructpix2pix-pipeline) | [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/BleachNick/SD3_UltraEdit_freeform) [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/CaptainZZZ/sd3-instructpix2pix) | [Jiayu Zhang](https://github.com/xduzhangjiayu) and [Haozhe Zhao](https://github.com/HaozheZhao)|

Comment on lines 5475 to 5477
![Original image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/mountain.png)
![Edited image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/edited.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Result
![Original image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/mountain.png)
![Edited image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/edited.png)
|Original|Edited|
|---|---|
|![Original image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/mountain.png)|![Edited image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/edited.png)

@xduzhangjiayu
Copy link
Contributor Author

@asomoza OK, done

@asomoza
Copy link
Member

asomoza commented Apr 30, 2025

@bot /style

Copy link
Contributor

Style fixes have been applied. View the workflow run here.

@asomoza
Copy link
Member

asomoza commented Apr 30, 2025

thanks!

@asomoza asomoza merged commit 8cd7426 into huggingface:main Apr 30, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants