-
Notifications
You must be signed in to change notification settings - Fork 6k
add PAG support for Stable Diffusion 3 #8861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor comments but other than those this looks quite sleek!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a ton for working on this!
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
I was playing again with this because I needed a cool generation and I gave it a revisit. I found a cool application that I'd like to share. What I found are two things, I was playing with a pag scale that was too high, just using something like 0.5 or 0.7 gives good results that clean a little the image, for example:
The original has a lot of details which always makes me impressed on how good the SD3 VAE is, but it has some defects like the shape of the tires, the weirdness of the front bumper among other things Just applying a low pag scale with specific layers, cleans the image a lot and makes the generation better, the only problem is that it makes the truck cleaner which is wrong, but maybe that can be fixed with some other layers or just use this as base for a better generation. The second thing I found is that we can change the "style" of the generation with just applying pag, for example, without doing anything in the prompt I can change this generation to a more concept art style which I think its really cool.
And if I change the prompt to "concept art" instead of "photo":
It seems to help to keep the composition consistent even if you change the style of the prompt. |
@sunovivid Just for notification - we merged #8936 now. It should be easier to now directly apply PAG without the extra SD3PAG class. You'll have to pass PAG related SD3 attention processors as a parameter when calling Thanks for the amazing work here! Although a bit unrealistic, it might be nice to get this ready by tomorrow since a Diffusers release will be happening soon and it'd be nice to ship a PAG variant for SD3. |
@a-r-r-o-w Ok! I want to get a chance. I will try it now. Thank you for your awesome refactoring! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunovivid Thank you for the kind words! This is really close to merging, just a couple of things:
@a-r-r-o-w I fixed the code following all your comments and added tests! Thank you for your comments and guidance. |
Co-authored-by: Aryan <[email protected]>
That's sounds good as well. I tried it and there isn't a significant impact on speed (a few extra seconds but that's okay and should be good to test with atleast 2 layers) |
Thanks! I set |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
OK. There is another error. Sorry for bothering you. I'll check it out. |
@a-r-r-o-w Thank you for your patience! The failure in the first test case was due to using joint attention in SD3, which uses the name The second test case failure was harder to pinpoint. It occurred because, as this is a randomly initialized model, the random weights caused it to turn into Sample grid using CFG and PAG (from left to right, PAG=[1.0, 1.5, 3.0, 5.0], from top to bottom, CFG=[0.5, 0.75, 1.0, 3.0]): a photo of a cat holding a sign that says hello world 19th century Scottish wizard with a mysterious smile and a piercing gaze, enigmatic, photorealistic, incredibly detailed, sharpness, detail, cinematic lighting Sampling code I used (a bit dirty bur for reproducibility):
Therefore, I have removed the assert statement in the test. I think it is sufficient if A new test case failure has been added, which I am not aware of. It seems to be related to the internal model. Do you know why this failure case was added?
|
Thanks for your awesome contributions, @sunovivid! |
add pag sd3 --------- Co-authored-by: HyoungwonCho <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: crepejung00 <[email protected]> Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Aryan <[email protected]> Co-authored-by: Aryan <[email protected]>
What does this PR do?
This is a PR for adding PAG support for SD3!
SD3 differs slightly from SD and SDXL because it employs Rectified Flow and uses the MMDiT backbone, not UNet.
For the joint attention in MMDiT, we can apply perturbed self-attention by masking attention between image patches, following the principles of PAG.
It works quite well. Here is an example with PAG + SD3.
Examples
"Pirate ship trapped in a cosmic maelstrom nebula"

From left to right, the PAG scale increases to 0.5, 1.0, 3.0, 5.0, and 7.0.
From top to bottom, the CFG scale increases to 1.0, 3.0, 5.0, and 7.0.
How to Use
Ablations on Layer
For those interested in knowing which layer to perturb, I am providing the results of single-layer perturbation for all the MMDiT attention layers in SD3.
ablations
from left to right, guidance scale: 3.0, 5.0, 7.0from top to bottom,






pag_applied_layer
: 0,1,2,3from top to bottom,
pag_applied_layer
: 4,5,6,7from top to bottom,
pag_applied_layer
: 8,9,10,11from top to bottom,
pag_applied_layer
: 12,13,14,15from top to bottom,
pag_applied_layer
: 16,17,18,19from top to bottom,
pag_applied_layer
: 20,21,22,23Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yiyixuxu @asomoza
@sayakpaul @a-r-r-o-w also could be interested
Thank you for your time!
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.