Skip to content

Add Qwen3 #2249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add Qwen3 #2249

wants to merge 3 commits into from

Conversation

kanpuriyanawab
Copy link
Collaborator

@kanpuriyanawab kanpuriyanawab commented May 10, 2025

This PR adds Qwen3 Backbone to library

What's pending?
Backbone Outputs are matching with atol 1e-2

very small percentage of output logits fall into 1e-2 and 1e-3 thereby causing assert to fail

image

^ this is with 0.6 B checkpoints, with 4B checkpoint, it nicely passes with atol 1e-3

@kanpuriyanawab
Copy link
Collaborator Author

I checked multiple times and there is no implementational diff bw keras and hf one except us using einsumdense layers..

Query dense layer causes very small diversion but it is amplified by norm layer (and this norm layer fyi, has been used in all other llms -- llama, qwen2.5, and qwen-moe exact copy paste).. i think culprits are weights

@divyashreepathihalli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant