@@ -4,278 +4,6 @@ title: Training
4
4
5
5
# :material-file-document: Training
6
6
7
- # Textual Inversion Training
8
- ## ** Personalizing Text-to-Image Generation**
7
+ Invoke Training has moved to its own repository, with a dedicated UI for accessing common scripts like Textual Inversion and LoRA training.
9
8
10
- You may personalize the generated images to provide your own styles or objects
11
- by training a new LDM checkpoint and introducing a new vocabulary to the fixed
12
- model as a (.pt) embeddings file. Alternatively, you may use or train
13
- HuggingFace Concepts embeddings files (.bin) from
14
- < https://huggingface.co/sd-concepts-library > and its associated
15
- notebooks.
16
-
17
- ## ** Hardware and Software Requirements**
18
-
19
- You will need a GPU to perform training in a reasonable length of
20
- time, and at least 12 GB of VRAM. We recommend using the [ ` xformers `
21
- library] ( ../installation/070_INSTALL_XFORMERS.md ) to accelerate the
22
- training process further. During training, about ~ 8 GB is temporarily
23
- needed in order to store intermediate models, checkpoints and logs.
24
-
25
- ## ** Preparing for Training**
26
-
27
- To train, prepare a folder that contains 3-5 images that illustrate
28
- the object or concept. It is good to provide a variety of examples or
29
- poses to avoid overtraining the system. Format these images as PNG
30
- (preferred) or JPG. You do not need to resize or crop the images in
31
- advance, but for more control you may wish to do so.
32
-
33
- Place the training images in a directory on the machine InvokeAI runs
34
- on. We recommend placing them in a subdirectory of the
35
- ` text-inversion-training-data ` folder located in the InvokeAI root
36
- directory, ordinarily ` ~/invokeai ` (Linux/Mac), or
37
- ` C:\Users\your_name\invokeai ` (Windows). For example, to create an
38
- embedding for the "psychedelic" style, you'd place the training images
39
- into the directory
40
- ` ~invokeai/text-inversion-training-data/psychedelic ` .
41
-
42
- ## ** Launching Training Using the Console Front End**
43
-
44
- InvokeAI 2.3 and higher comes with a text console-based training front
45
- end. From within the ` invoke.sh ` /` invoke.bat ` Invoke launcher script,
46
- start training tool selecting choice (3):
47
-
48
- ``` sh
49
- 1 " Generate images with a browser-based interface"
50
- 2 " Explore InvokeAI nodes using a command-line interface"
51
- 3 " Textual inversion training"
52
- 4 " Merge models (diffusers type only)"
53
- 5 " Download and install models"
54
- 6 " Change InvokeAI startup options"
55
- 7 " Re-run the configure script to fix a broken install or to complete a major upgrade"
56
- 8 " Open the developer console"
57
- 9 " Update InvokeAI"
58
- ```
59
-
60
- Alternatively, you can select option (8) or from the command line, with the InvokeAI virtual environment active,
61
- you can then launch the front end with the command ` invokeai-ti --gui ` .
62
-
63
- This will launch a text-based front end that will look like this:
64
-
65
- <figure markdown >
66
- ![ ti-frontend] ( ../assets/textual-inversion/ti-frontend.png )
67
- </figure >
68
-
69
- The interface is keyboard-based. Move from field to field using
70
- control-N (^N) to move to the next field and control-P (^P) to the
71
- previous one. <Tab > and <shift-TAB > work as well. Once a field is
72
- active, use the cursor keys. In a checkbox group, use the up and down
73
- cursor keys to move from choice to choice, and <space > to select a
74
- choice. In a scrollbar, use the left and right cursor keys to increase
75
- and decrease the value of the scroll. In textfields, type the desired
76
- values.
77
-
78
- The number of parameters may look intimidating, but in most cases the
79
- predefined defaults work fine. The red circled fields in the above
80
- illustration are the ones you will adjust most frequently.
81
-
82
- ### Model Name
83
-
84
- This will list all the diffusers models that are currently
85
- installed. Select the one you wish to use as the basis for your
86
- embedding. Be aware that if you use a SD-1.X-based model for your
87
- training, you will only be able to use this embedding with other
88
- SD-1.X-based models. Similarly, if you train on SD-2.X, you will only
89
- be able to use the embeddings with models based on SD-2.X.
90
-
91
- ### Trigger Term
92
-
93
- This is the prompt term you will use to trigger the embedding. Type a
94
- single word or phrase you wish to use as the trigger, example
95
- "psychedelic" (without angle brackets). Within InvokeAI, you will then
96
- be able to activate the trigger using the syntax ` <psychedelic> ` .
97
-
98
- ### Initializer
99
-
100
- This is a single character that is used internally during the training
101
- process as a placeholder for the trigger term. It defaults to "* " and
102
- can usually be left alone.
103
-
104
- ### Resume from last saved checkpoint
105
-
106
- As training proceeds, textual inversion will write a series of
107
- intermediate files that can be used to resume training from where it
108
- was left off in the case of an interruption. This checkbox will be
109
- automatically selected if you provide a previously used trigger term
110
- and at least one checkpoint file is found on disk.
111
-
112
- Note that as of 20 January 2023, resume does not seem to be working
113
- properly due to an issue with the upstream code.
114
-
115
- ### Data Training Directory
116
-
117
- This is the location of the images to be used for training. When you
118
- select a trigger term like "my-trigger", the frontend will prepopulate
119
- this field with ` ~/invokeai/text-inversion-training-data/my-trigger ` ,
120
- but you can change the path to wherever you want.
121
-
122
- ### Output Destination Directory
123
-
124
- This is the location of the logs, checkpoint files, and embedding
125
- files created during training. When you select a trigger term like
126
- "my-trigger", the frontend will prepopulate this field with
127
- ` ~/invokeai/text-inversion-output/my-trigger ` , but you can change the
128
- path to wherever you want.
129
-
130
- ### Image resolution
131
-
132
- The images in the training directory will be automatically scaled to
133
- the value you use here. For best results, you will want to use the
134
- same default resolution of the underlying model (512 pixels for
135
- SD-1.5, 768 for the larger version of SD-2.1).
136
-
137
- ### Center crop images
138
-
139
- If this is selected, your images will be center cropped to make them
140
- square before resizing them to the desired resolution. Center cropping
141
- can indiscriminately cut off the top of subjects' heads for portrait
142
- aspect images, so if you have images like this, you may wish to use a
143
- photoeditor to manually crop them to a square aspect ratio.
144
-
145
- ### Mixed precision
146
-
147
- Select the floating point precision for the embedding. "no" will
148
- result in a full 32-bit precision, "fp16" will provide 16-bit
149
- precision, and "bf16" will provide mixed precision (only available
150
- when XFormers is used).
151
-
152
- ### Max training steps
153
-
154
- How many steps the training will take before the model converges. Most
155
- training sets will converge with 2000-3000 steps.
156
-
157
- ### Batch size
158
-
159
- This adjusts how many training images are processed simultaneously in
160
- each step. Higher values will cause the training process to run more
161
- quickly, but use more memory. The default size will run with GPUs with
162
- as little as 12 GB.
163
-
164
- ### Learning rate
165
-
166
- The rate at which the system adjusts its internal weights during
167
- training. Higher values risk overtraining (getting the same image each
168
- time), and lower values will take more steps to train a good
169
- model. The default of 0.0005 is conservative; you may wish to increase
170
- it to 0.005 to speed up training.
171
-
172
- ### Scale learning rate by number of GPUs, steps and batch size
173
-
174
- If this is selected (the default) the system will adjust the provided
175
- learning rate to improve performance.
176
-
177
- ### Use xformers acceleration
178
-
179
- This will activate XFormers memory-efficient attention. You need to
180
- have XFormers installed for this to have an effect.
181
-
182
- ### Learning rate scheduler
183
-
184
- This adjusts how the learning rate changes over the course of
185
- training. The default "constant" means to use a constant learning rate
186
- for the entire training session. The other values scale the learning
187
- rate according to various formulas.
188
-
189
- Only "constant" is supported by the XFormers library.
190
-
191
- ### Gradient accumulation steps
192
-
193
- This is a parameter that allows you to use bigger batch sizes than
194
- your GPU's VRAM would ordinarily accommodate, at the cost of some
195
- performance.
196
-
197
- ### Warmup steps
198
-
199
- If "constant_with_warmup" is selected in the learning rate scheduler,
200
- then this provides the number of warmup steps. Warmup steps have a
201
- very low learning rate, and are one way of preventing early
202
- overtraining.
203
-
204
- ## The training run
205
-
206
- Start the training run by advancing to the OK button (bottom right)
207
- and pressing <enter >. A series of progress messages will be displayed
208
- as the training process proceeds. This may take an hour or two,
209
- depending on settings and the speed of your system. Various log and
210
- checkpoint files will be written into the output directory (ordinarily
211
- ` ~/invokeai/text-inversion-output/my-model/ ` )
212
-
213
- At the end of successful training, the system will copy the file
214
- ` learned_embeds.bin ` into the InvokeAI root directory's ` embeddings `
215
- directory, using a subdirectory named after the trigger token. For
216
- example, if the trigger token was ` psychedelic ` , then look for the
217
- embeddings file in
218
- ` ~/invokeai/embeddings/psychedelic/learned_embeds.bin `
219
-
220
- You may now launch InvokeAI and try out a prompt that uses the trigger
221
- term. For example ` a plate of banana sushi in <psychedelic> style ` .
222
-
223
- ## ** Training with the Command-Line Script**
224
-
225
- Training can also be done using a traditional command-line script. It
226
- can be launched from within the "developer's console", or from the
227
- command line after activating InvokeAI's virtual environment.
228
-
229
- It accepts a large number of arguments, which can be summarized by
230
- passing the ` --help ` argument:
231
-
232
- ``` sh
233
- invokeai-ti --help
234
- ```
235
-
236
- Typical usage is shown here:
237
- ``` sh
238
- invokeai-ti \
239
- --model=stable-diffusion-1.5 \
240
- --resolution=512 \
241
- --learnable_property=style \
242
- --initializer_token=' *' \
243
- --placeholder_token=' <psychedelic>' \
244
- --train_data_dir=/home/lstein/invokeai/training-data/psychedelic \
245
- --output_dir=/home/lstein/invokeai/text-inversion-training/psychedelic \
246
- --scale_lr \
247
- --train_batch_size=8 \
248
- --gradient_accumulation_steps=4 \
249
- --max_train_steps=3000 \
250
- --learning_rate=0.0005 \
251
- --resume_from_checkpoint=latest \
252
- --lr_scheduler=constant \
253
- --mixed_precision=fp16 \
254
- --only_save_embeds
255
- ```
256
-
257
- ## Troubleshooting
258
-
259
- ### ` Cannot load embedding for <trigger>. It was trained on a model with token dimension 1024, but the current model has token dimension 768 `
260
-
261
- Messages like this indicate you trained the embedding on a different base model than the currently selected one.
262
-
263
- For example, in the error above, the training was done on SD2.1 (768x768) but it was used on SD1.5 (512x512).
264
-
265
- ## Reading
266
-
267
- For more information on textual inversion, please see the following
268
- resources:
269
-
270
- * The [ textual inversion repository] ( https://github.com/rinongal/textual_inversion ) and
271
- associated paper for details and limitations.
272
- * [ HuggingFace's textual inversion training
273
- page] ( https://huggingface.co/docs/diffusers/training/text_inversion )
274
- * [ HuggingFace example script
275
- documentation] ( https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion )
276
- (Note that this script is similar to, but not identical, to
277
- ` textual_inversion ` , but produces embed files that are completely compatible.
278
-
279
- ---
280
-
281
- copyright (c) 2023, Lincoln Stein and the InvokeAI Development Team
9
+ You can find more by visiting the repo at https://github.com/invoke-ai/invoke-training
0 commit comments