Skip to content

fix init weights issue for critic/reward model #983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 8, 2025

Conversation

jouw
Copy link
Contributor

@jouw jouw commented Jul 3, 2025

Add the following code to disable init weights operation, otherwise it will init model weights and got an error. @hwchen2017

with no_init_weights():

Detailed explanation as belows.

Take Qwen3Model as example, the function call stack is:
Qwen3Model.init() -> Qwen3Model.post_init() -> PreTrainedModel.init_weights()

If we don't add with no_init_weights(): for the code model = model_class.from_config(model_config), the parameter _init_weights will be true, and cause error.

https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py

def init_weights(self):
    """
    If needed prunes and maybe initializes weights. If using a custom `PreTrainedModel`, you need to implement any
    initialization logic in `_init_weights`.
    """
    # Prune heads if needed
    if self.config.pruned_heads:
        self.prune_heads(self.config.pruned_heads)

    if _init_weights:
        # Initialize weights
        self.initialize_weights()

        # Tie weights should be skipped when not initializing all weights
        # since from_pretrained(...) calls tie weights anyways
        self.tie_weights()

@jouw jouw requested a review from tjruwase as a code owner July 3, 2025 07:35
@hwchen2017
Copy link
Contributor

Hi @jouw, can you fix the format and DCO error?

@jouw jouw force-pushed the fix-init-weights-issue branch from e26fe55 to 9a7062b Compare July 6, 2025 10:31
@jouw
Copy link
Contributor Author

jouw commented Jul 6, 2025

Hi @jouw, can you fix the format and DCO error?

hi @hwchen2017 , I have fixed the error, please help review the change, thanks!

@jouw
Copy link
Contributor Author

jouw commented Jul 8, 2025

Hi @jouw, can you fix the format and DCO error?

hi @hwchen2017 , I have fixed the errors, can you help merge the change? Thanks!

@hwchen2017 hwchen2017 merged commit 3d83278 into deepspeedai:master Jul 8, 2025
2 checks passed
@tohtana
Copy link
Contributor

tohtana commented Jul 18, 2025

Hi @jouw,

It seems this breaks our CI test using DS-Chat. Can you share more about the error you encountered?
Can we revert this fix to unblock PRs on DeepSpeed repo if it is supposed to take time to fix?

@PKUWZP
Copy link
Contributor

PKUWZP commented Jul 29, 2025

Let's revert this PR and fix it.

hwchen2017 added a commit that referenced this pull request Jul 29, 2025
hwchen2017 added a commit that referenced this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants