Skip to content

Conversation

quchenyuan
Copy link

What does this PR do?

This PR fixes the --train_text_encoder flag in train_dreambooth_sd3.py, which previously failed during training because the encode_prompt and _encode_prompt_with_t5 functions did not properly handle cases where tokenizers or text_input_ids are None.

When --train_text_encoder is enabled, the training pipeline pre-tokenizes prompts and passes text_input_ids_list directly to avoid retokenizing in every step — but the original code assumed tokenizers was always available, causing crashes like:

AttributeError: 'NoneType' object has no attribute 'encode'
or
ValueError: text_input_ids must be provided...

This PR:

✅ Makes _encode_prompt_with_t5 and encode_prompt robust to None tokenizer by:

  • Accepting precomputed text_input_ids as fallback
  • Adding validation when tokenizer is missing
  • Preserving batch size logic even when prompt is None

✅ Ensures compatibility between training and inference code paths

Fixes # ( 8507)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant