Beginner's Guide - Generate Videos With SwarmUI #716

mcmonkey4eva · 2025-04-18T04:02:45Z

mcmonkey4eva
Apr 18, 2025
Maintainer

So, you want to generate AI videos with SwarmUI? Don't worry, it's easy!

(Forenote: this guide was written in April 2025. Things are likely to change in the future, and this guide will eventually be outdated.)

Part One: Pick A Video Model

Video models supported in SwarmUI are documented in the Video Model Support document. That page is kept up to date, with a list of all supported model classes, details about the unique usage needs of each, a chart to guide you on picking the right model, and a general recommendation for what most users should pick.

When you're first starting out with video gen, keep it simple: Use the base models for the given model class, don't play with parameters too much, and use easy / friendly test content to generate. If something goes wrong, you might have to ask for help, and you don't want to show someone your weirdest gens or unreadably long prompt/parameter piles.
Once you've got the basics then, then move on to generating what you're actually hoping for. Search civitai or other model sites for finetuned model variants or loras that fit what you're hoping to get, and feed the model with prompts/parameters that you actually want.

At time of writing, the leading video model class is Wan 2.1. In this case, the docs give a pretty long list of install options. Because I'm running an RTX 4090, I can fit the large variant (14B), and I'll have best performance with fp8 models. I want both text2video and image2video models, and for i2v I prefer the faster option instead of the higher res option. So I'm grabbing Wan 2.1 Text2Video 14B fp8_scaled and Wan 2.1 Image2Video 14B 480p fp8_scaled. Your own choices might be difference, and of course if you're reading this in the future there might be different options available.

Download the model, save it in the relevant folder (usually in diffusion_models), I prefer to organize my models into subfolders, so I'm saving these Wan models into SwarmUI/Models/diffusion_models/Wan/.

Refresh your models list in Swarm and make sure the model shows up. Feel free to click the "=" menu on the side of the menu and then "Edit Metadata" to add some extra info or an icon to the model.

In my actual personal setup, my wan folder is full of a bunch of different Wan variant models, and I have tacked on lazy icons to recognize a few of them more easily

July 2025 Note

If you're reading this guide around the time of July 2025, and Wan 14B is still the best base model, you'll also want to download the Lightx2v LoRA and save it to your LoRAs folder.

This is a magic trick that can get you between a 2 to 10x speedup in running Wan 14B.

Later down as you follow the guide below, when it says to select the model, make sure to also select the lora.
Ignore the CFG Scale recommendations in the guide below (intended for base wan 14b), and instead set CFG Scale to exactly 1.
Set your Steps to 4 or 8.
Also, under Advanced -> Advanced Video -> set VideoFPS to 24.
When doing image2video, use the image-to-video param group version of CFG and Steps.
Other parameters can be kept the same as recommended.

Part Two: Basic Text2Video

Setup T2V

Text-To-Video is the most basic form of ai video generation. You type a prompt, you get a video, done deal. To be honest with you, it's not a great method, for reason we'll get into later... but it's usually fast and easy, and every model class supports it, so let's start there.

In your models list, click your Text2Video model to select it.

Make sure your other parameters are default - if you're not sure, click "Quick Tools" at the top right, then "Reset Params to Default"

In your parameter list (the left sidebar), configure parameters according to the video model support doc and your choices.

In my case, with Wan Text2Video 14B, I've made the following adjustments:

CFG Scale: lowered to 6
Text To Video Group: Toggled on, so I can customize what's inside
Text2Video Frames: Lowered to 49. The default for Wan is 81, but shorter video durations are faster to generate. Make sure if you customize the frame count that what you choose is supported by your model - the video model support doc will list the valid range.
Text2Video Format: I'm going to upload videos to GitHub here for you to read, so I selected gif-hd, which is the best format GitHub natively embeds. I usually prefer webp, but a lot of sites don't support that.
Resolution: Wan 14B defaults to 960x960, but I want to generate more quickly, so I've reduced it to a custom 640x640. Wan does support this, and only loses a small amount of quality in exchange for much faster generation time.

Understanding Params

Note: When in doubt, there's docs! Are you curious for example about what the options for that "Text2Video Format" param actually are? Just click that ? button

SwarmUI is covered in docs, both in the docs folder (where the video model support doc is), and in-line in the UI. You should never feel completely lost while working in SwarmUI - there's always a way to figure things out. Worst case scenario, if nothing in the UI or the docs clarifies it, come ask on the Discord.

Generate

Now, the most important parameter: prompt! I want something dramatic, but cute, which represents how cool it is that SwarmUI is generating videos for me... so how about real video of a cat walking through a dimly lit rainbow forest, beneath a neon sign that reads "Swarm UI", shot on Sony a6100
Different models have different prompting needs. Wan is a model that likes simple clear English or Chinese language sentences. A minimal bit of "tagging" can help guide style, but don't overdo it - in this case I'll just add a shot on Sony a6100 to encourage it to look like a real camera video instead of a cartoon aesthetic.

Then... hit that big "Generate" button! Wan-14B is pretty slow, this took me about 3 and a half minutes to generate:

That... is decent, but not quite what I was hoping for. It's got all the pieces, but not really focused on the cat walking around like I wanted.

If speed is an issue, other models are faster (such as Wan 1.3B, or LTX-V is quite fast, but check the video model support doc for up-to-date recommendations).

If you don't like the outcome, try changing the basic parameters - frame count, prompt, resolution, etc. and try again. Or, generate again without changing any param (with Seed set to -1, ie randomize) to see if you'll get lucky on the next try.
I recommend always doing a variety of generations with any new model while you're starting, just to get familiar with how the model responds to inputs.

I fiddled with the params and played luck-o-the-seed a bit, and ended up with the video I used as the header of this guide using the same prompt and a different resolution.

Here I have a generation going where I can already see the composition isn't how I want it to be:

So I'm going to go ahead and click the "Interrupt" button to tell it to stop:

That will end the generation early (it may take a few seconds to process the interrupt) and allow you to immediately queue up a new attempt.

Watch The Gen

Most video models in SwarmUI natively support live previews, so while you're waiting for it to generate, you can watch a preview of the video that's coming.

Part Three: Text To Image To Video

Now let's talk about the way I think is better for ai video generation: generate an image you really like, and then use an image-to-video model to make it move.
I prefer this because image models often run in seconds, so you can experiment a lot with images, vs text2video often takes a while to generate - and you don't want to wait 3 minutes just to find out the result was bad. There's also tons of loras and other customizations out there for image models, whereas video models often have fewer available.

Swarm makes text-to-image-to-video super easy to do, so let's go for it!

Set up your image generation

First, get image generation going. Basic image gen setup is covered in the Basic Usage Doc. In my case, I'm going to use Flux Dev with CFG=1 (required by Flux Dev), largely default parameters, and the same prompt as the above generation.

First try looks awesome.

Enable image to video

Now, let's enable the Image To Video parameter group, select the video model we're using (in my case wan 14B 480 fp8).
Most parameters here you can leave default/unset, they will automatically default correctly. The big one you'll want to play with is of course frame count. That "Video Resolution" parameter is magic, it will by default automatically resize the Flux image (1024x1024) to the resolution set in the video model's metadata (in this case, 640x640), accounting for whatever aspect ratio you used too. Convenient!
I'll once again use gif-hd so I can post to GitHub here.

Note I left "Video CFG" unchecked: Wan default CFG of 6 is perfectly fine, and Swarm will automatically apply the class-appropriate-default CFG to a video when that's unchecked. This is different from base model generation, where you're expected to set CFG yourself normally.

Generate the video

Now hit "Generate" again - you'll have an image generate, and then it will generate a video in which the first frame is the image you just made, and the rest of the video is hopefully in moving in a neat way.
Don't like the image you got, and don't want to wait for the video? Just hit that Interrupt button.

The video will be cancelled and you can try again.

In my case, the image and video it made is pretty neat I think

Different Prompts

You'll often want a different prompt between the Text-to-Image and Image-to-Video stages. Wan I2V for example specifically only wants you to prompt the motion normally. How can you do this? Easy! Here's an example:

real video of a cat walking through a dimly lit rainbow forest, beneath a neon sign that reads "Swarm UI", shot on Sony a6100
<video> the cat walks forward through the forest

That <video> syntax identifies that that block of the prompt is used for the video, and the stuff before it is used for the image.

Alternatives

Like the image you got, but don't like the video you got? The other option available is direct image to video, covered below. You can simply generate images in advance, then separately go and generate videos of them. This lets you play with video params and roll seeds more.

Another concern that can arise here is you might simply run out of system RAM - two entire diffusion models loaded can eat up some space! In that case, you'll want to first generate images, then stop and switch to image-to-video generation.

Part Four: Direct Image To Video

Have an image of your content already, or generated one in advance with a text-to-image model? ~~There's an app for that~~ there's an easy way to do that, too!

Set it up

First, drag your image onto the "Init Image" parameter, and set "Init Image Creativity" to 0 (!Important! Make sure creativity is set to 0! Forgetting this is a common mistake!)

In my case, I'm grabbing the flux gen I made earlier:

You'll also want to copy the image aspect ratio using the "Res" button next to Init Image

Double check your "Resolution" parameter is set how you expect it to be.

NOTE: Swarm's main Generate tab interface is an image generation system, and image2video is a special case normally reserved for text2image2video setups, so what we're doing here is a little trick where we set up text2image2video, but skip the text2image stage. That's why we're using "Init Image" with "Creativity=0", and why we need to be careful with model selection.

In the "Models" menu at the bottom, you can select any model you want, it doesn't particularly matter, because the text2image stage is being skipped - however it's common to select to the image-to-video model here just to avoid memory/load issues. Note that you cannot use dedicated image2video models as a real base model, we're only allowed to select it here because we're explicitly skipping that stage.

Now, the real setup: enable the "Image To Video" parameter group, and set things up how you want. Select the video model we're using (in my case wan 14B 480 fp8). Most parameters here you can leave default/unset, they will automatically default correctly. The big one you'll want to play with is of course frame count.

For right now, I want to generate videos very quickly, so I'm going to set Frames down to 33, and I'm going to do a little trick: First, I set the Resolution to a custom 512x512:

Then, I'm going to set "Video Resolution" to "Image", meaning copy my standard resolution parameter without any resize magic.

Without this, the default "Image Aspect, Model Res" would resize the image to the video models default (640x640) which I want to go lower than just to get some more speed.

And, of course, format gif-hd because I need to post my outputs here on github. You'll probably use webp.

Here's my final params:

And 90 seconds later, I got a quick gen output:

... That's a bit wonky, not quite the type of rainbows I was hoping for. I'm not quite prompting this right!

The nice thing about Wan's I2V models, is prompting is actually very easy: we don't need to tell it what's in the image, it already knows! We only need to prompt the motion! Because I prompted for rainbows above, it added rainbow motion. I don't want that, I just want the cat to walk forward. Let's make it way simpler: the cat walks forward through the forest

Wowza! Way better!

Part Five: Going Beyond

There's so much more you can do with video generation now that you've got the basics.

How about trying some other model classes? There's new ones all the time.

How about some high res / high length / high detail gens? Can you make something beautiful?

There's tons of performance/microquality/etc. hacks out there - TorchCompile, TeaCache, etc. - details are beyond the scope of this guide, but look around at what parameters are available in the "Advanced" section and what Extensions are available in the server tab for some options. Also don't be afraid to look at online discussions on Discord, GitHub, Reddit, etc. to see what the hot new techniques are.

Check the docs for the specific video model you're using as well. For example, with Wan, there's a section in the docs labeled "CausVid" about a collection of loras that make the model generate videos nearly 10x faster!

Once you got a good approach locked in, my favorite part: bulk automation! Set up an Text-To-Image-To-Video pipeline you like, get some prompt formats and wildcards that create great results, set "Images" to 100, hit "Generate", and go to bed. When you wake up in the morning, scroll through all the cool videos you generated overnight and hit the Star button on your favorites to save them to a special folder of your image history.

Want to bulk automate image-to-video? Fill up a folder on your PC with images, set the filename to appropriate prompts for the images, then in SwarmUI use Tools -> Image Edit Batcher -> give it your input folder, pick an output folder, check "Use As Init" and "Append Filename to Prompt", then hit "Run Batch" (replaces the Generate button).

guythis31 · 2025-06-14T13:44:43Z

guythis31
Jun 14, 2025

Thanks for this, it has helped me greatly!

0 replies

FurkanGozukara · 2025-06-14T18:16:40Z

FurkanGozukara
Jun 14, 2025

great documentation i shared on my platforms as well this guide

0 replies

sguart · 2025-06-27T20:16:36Z

sguart
Jun 27, 2025

Great tutorial very useful.

While look at the Advanced section, what is "Video Extend"? How are we supposed to use this functionality? Search around online and don't see any mentioning of this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Beginner's Guide - Generate Videos With SwarmUI #716

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Beginner's Guide - Generate Videos With SwarmUI #716

Uh oh!

Uh oh!

mcmonkey4eva Apr 18, 2025 Maintainer

Part One: Pick A Video Model

July 2025 Note

Part Two: Basic Text2Video

Setup T2V

Understanding Params

Generate

Watch The Gen

Part Three: Text To Image To Video

Set up your image generation

Enable image to video

Generate the video

Different Prompts

Alternatives

Part Four: Direct Image To Video

Set it up

Part Five: Going Beyond

Replies: 3 comments

Uh oh!

guythis31 Jun 14, 2025

Uh oh!

FurkanGozukara Jun 14, 2025

Uh oh!

sguart Jun 27, 2025

mcmonkey4eva
Apr 18, 2025
Maintainer

guythis31
Jun 14, 2025

FurkanGozukara
Jun 14, 2025

sguart
Jun 27, 2025