context : fix index overflow on huge outputs #15080

compilade · 2025-08-05T02:50:00Z

This fixes a problem I've noticed when working on #15060 and running llama-imatrix with -ub 32768 -b 32768 to compute 64 chunks (of 512 tokens) at once with https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507. (I think any model with a vocab size of at least 131073 tokens should trigger this problem (since 2**32 == 32768 * 132072). That one has a vocab size of 151936)

At least two places can overflow on big batches, which are

llama.cpp/src/llama-context.cpp

Line 1132 in ec428b0

    
           ggml_backend_tensor_get_async(backend_res, t_logits, logits_out, 0, n_outputs*n_vocab*sizeof(float));

and

llama.cpp/src/llama-context.cpp

Line 1340 in ec428b0

std::swap(logits[i0*n_vocab + k], logits[i1*n_vocab + k]);

This PR should fix that.

Before (notice the high perplexity of later chunks in the huge batch):

compute_imatrix: 1096.04 seconds per pass - ETA 36.53 minutes
[1]4.9921,[2]3.5230,[3]3.2906,[4]3.6947,[5]3.5449,[6]3.1976,[7]3.5389,[8]3.5263,[9]6.2084,[10]17.0548,[11]38.9870,[12]77.6513,[13]139.1047,[14]229.2792,[15]353.5501,[16]516.4480,[17]721.5067,[18]971.2237,[19]1267.1001,[20]1609.7314,[21]1998.9256,[22]2433.8312,[23]2913.0640,[24]3434.8252,[25]3997.0067,[26]4597.2832,[27]5233.1890,[28]5902.1827,[29]6601.6984,[30]7329.1878,[31]8082.1512,[32]8858.1627,[33]9654.8871,[34]10470.0929,[35]11301.6604,[36]12147.5862,[37]13005.9851,[38]13875.0902,[39]14753.2511,[40]15638.9308,[41]16530.7019,[42]17427.2419,[43]18327.3288,[44]19229.8356,[45]20133.7248,[46]21038.0440,[47]21941.9197,[48]22844.5530,[49]23745.2143,[50]24643.2388,[51]25538.0218,[52]26429.0147,[53]27315.7209,[54]28197.6915,[55]29074.5227,[56]29945.8515,[57]30811.3532,[58]31670.7381,[59]32523.7490,[60]33370.1584,[61]34209.7666,[62]35042.3990,[63]35867.9044,[64]36686.1527,

After (looks more normal):

compute_imatrix: 1000.44 seconds per pass - ETA 33.33 minutes
[1]4.9921,[2]3.5230,[3]3.2906,[4]3.6947,[5]3.5449,[6]3.1976,[7]3.5389,[8]3.5263,[9]3.9438,[10]3.8663,[11]3.8256,[12]4.2541,[13]4.8149,[14]5.0722,[15]5.5042,[16]5.8128,[17]6.0388,[18]6.4330,[19]6.2236,[20]6.3602,[21]6.3347,[22]6.3306,[23]6.2010,[24]6.4108,[25]6.6068,[26]6.5006,[27]6.5882,[28]6.6548,[29]6.8055,[30]6.7468,[31]6.5401,[32]6.2833,[33]6.1323,[34]6.0287,[35]5.9729,[36]5.9470,[37]5.9240,[38]5.9597,[39]5.9534,[40]6.1014,[41]6.1480,[42]6.3101,[43]6.4466,[44]6.6048,[45]6.7291,[46]6.8099,[47]6.7464,[48]6.8177,[49]6.8992,[50]6.9471,[51]6.8584,[52]6.9443,[53]7.0865,[54]7.1735,[55]7.2375,[56]7.3005,[57]7.3728,[58]7.4331,[59]7.4543,[60]7.4642,[61]7.4415,[62]7.4002,[63]7.4459,[64]7.5015,

Make sure to read the contributing guidelines before submitting a PR

* context : fix overflow when re-ordering huge outputs * context : fix logits size overflow for huge batches

compilade added 2 commits August 4, 2025 22:01

context : fix overflow when re-ordering huge outputs

f16a843

context : fix logits size overflow for huge batches

145401c

compilade added the bugfix fixes an issue or bug label Aug 5, 2025

CISC approved these changes Aug 5, 2025

View reviewed changes

CISC merged commit ee3a9fc into master Aug 5, 2025
45 of 47 checks passed

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025

context : fix index overflow on huge outputs (ggml-org#15080)

5ecb83a

* context : fix overflow when re-ordering huge outputs * context : fix logits size overflow for huge batches

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

context : fix index overflow on huge outputs #15080

context : fix index overflow on huge outputs #15080

compilade commented Aug 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

context : fix index overflow on huge outputs #15080

context : fix index overflow on huge outputs #15080

Conversation

compilade commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

compilade commented Aug 5, 2025 •

edited

Loading