Add Qwen3 #2249

kanpuriyanawab · 2025-05-10T05:20:30Z

This PR adds Qwen3 Backbone to library

What's pending?
Backbone Outputs are matching with atol 1e-2

very small percentage of output logits fall into 1e-2 and 1e-3 thereby causing assert to fail

^ this is with 0.6 B checkpoints, with 4B checkpoint, it nicely passes with atol 1e-3

kanpuriyanawab · 2025-05-16T05:10:39Z

I checked multiple times and there is no implementational diff bw keras and hf one except us using einsumdense layers..

Query dense layer causes very small diversion but it is amplified by norm layer (and this norm layer fyi, has been used in all other llms -- llama, qwen2.5, and qwen-moe exact copy paste).. i think culprits are weights

@divyashreepathihalli

kanpuriyanawab added 2 commits May 6, 2025 18:35

qwen3 init commit

82aff6c

update

ed20fd0

kanpuriyanawab requested a review from mattdangerw May 10, 2025 05:20

kanpuriyanawab self-assigned this May 10, 2025

kanpuriyanawab requested review from abheesht17 and divyashreepathihalli May 10, 2025 05:20

Merge branch 'keras-team:master' into qwen3

9409900

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3 #2249

Add Qwen3 #2249

kanpuriyanawab commented May 10, 2025 •

edited

Loading

kanpuriyanawab commented May 16, 2025

Add Qwen3 #2249

Are you sure you want to change the base?

Add Qwen3 #2249

Conversation

kanpuriyanawab commented May 10, 2025 • edited Loading

kanpuriyanawab commented May 16, 2025

kanpuriyanawab commented May 10, 2025 •

edited

Loading