Skip to content

context : fix index overflow on huge outputs #15080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 5, 2025

Conversation

compilade
Copy link
Collaborator

@compilade compilade commented Aug 5, 2025

This fixes a problem I've noticed when working on #15060 and running llama-imatrix with -ub 32768 -b 32768 to compute 64 chunks (of 512 tokens) at once with https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507. (I think any model with a vocab size of at least 131073 tokens should trigger this problem (since 2**32 == 32768 * 132072). That one has a vocab size of 151936)

At least two places can overflow on big batches, which are

ggml_backend_tensor_get_async(backend_res, t_logits, logits_out, 0, n_outputs*n_vocab*sizeof(float));

and

std::swap(logits[i0*n_vocab + k], logits[i1*n_vocab + k]);

This PR should fix that.

Before (notice the high perplexity of later chunks in the huge batch):

compute_imatrix: 1096.04 seconds per pass - ETA 36.53 minutes
[1]4.9921,[2]3.5230,[3]3.2906,[4]3.6947,[5]3.5449,[6]3.1976,[7]3.5389,[8]3.5263,[9]6.2084,[10]17.0548,[11]38.9870,[12]77.6513,[13]139.1047,[14]229.2792,[15]353.5501,[16]516.4480,[17]721.5067,[18]971.2237,[19]1267.1001,[20]1609.7314,[21]1998.9256,[22]2433.8312,[23]2913.0640,[24]3434.8252,[25]3997.0067,[26]4597.2832,[27]5233.1890,[28]5902.1827,[29]6601.6984,[30]7329.1878,[31]8082.1512,[32]8858.1627,[33]9654.8871,[34]10470.0929,[35]11301.6604,[36]12147.5862,[37]13005.9851,[38]13875.0902,[39]14753.2511,[40]15638.9308,[41]16530.7019,[42]17427.2419,[43]18327.3288,[44]19229.8356,[45]20133.7248,[46]21038.0440,[47]21941.9197,[48]22844.5530,[49]23745.2143,[50]24643.2388,[51]25538.0218,[52]26429.0147,[53]27315.7209,[54]28197.6915,[55]29074.5227,[56]29945.8515,[57]30811.3532,[58]31670.7381,[59]32523.7490,[60]33370.1584,[61]34209.7666,[62]35042.3990,[63]35867.9044,[64]36686.1527,

After (looks more normal):

compute_imatrix: 1000.44 seconds per pass - ETA 33.33 minutes
[1]4.9921,[2]3.5230,[3]3.2906,[4]3.6947,[5]3.5449,[6]3.1976,[7]3.5389,[8]3.5263,[9]3.9438,[10]3.8663,[11]3.8256,[12]4.2541,[13]4.8149,[14]5.0722,[15]5.5042,[16]5.8128,[17]6.0388,[18]6.4330,[19]6.2236,[20]6.3602,[21]6.3347,[22]6.3306,[23]6.2010,[24]6.4108,[25]6.6068,[26]6.5006,[27]6.5882,[28]6.6548,[29]6.8055,[30]6.7468,[31]6.5401,[32]6.2833,[33]6.1323,[34]6.0287,[35]5.9729,[36]5.9470,[37]5.9240,[38]5.9597,[39]5.9534,[40]6.1014,[41]6.1480,[42]6.3101,[43]6.4466,[44]6.6048,[45]6.7291,[46]6.8099,[47]6.7464,[48]6.8177,[49]6.8992,[50]6.9471,[51]6.8584,[52]6.9443,[53]7.0865,[54]7.1735,[55]7.2375,[56]7.3005,[57]7.3728,[58]7.4331,[59]7.4543,[60]7.4642,[61]7.4415,[62]7.4002,[63]7.4459,[64]7.5015,

Make sure to read the contributing guidelines before submitting a PR

@compilade compilade added the bugfix fixes an issue or bug label Aug 5, 2025
@CISC CISC merged commit ee3a9fc into master Aug 5, 2025
45 of 47 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025
* context : fix overflow when re-ordering huge outputs

* context : fix logits size overflow for huge batches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants