Zero-sized memory.init of dropped segments should trap #124

tlively · 2019-11-08T20:50:57Z

I'm currently working on generalizing Binaryen's memory-packing pass to optimize passive segments in addition to the active segments it already optimizes. The main idea is that when there is a long run of zeroes in a data segment, that data segment is split into two segments that do not contains those zeroes and any memory.init instructions that reference that segment are split into a sequence of memory.init and memory.fill instructions.

The problem is that if a memory.init instruction uses a section of a data segment that is only comprised of zeroes, it naively would be replaced with a single memory.fill instruction. But that would not be a correct transformation because the memory.fill instruction would not trap if the original data segment would have been dropped. Emulating the trapping behavior correctly would involve creating new globals to track the hidden state of each of the data segments that are split, which is unnecessarily complex. If zero-sized memory.init instructions could trap, a simpler solution would be to insert a zero-size memory.init of one of the split data segments before the memory.fill.

My understanding is that zero-length memory.init currently does not trap as a matter of convenience, so would anyone object to changing this behavior?

The text was updated successfully, but these errors were encountered:

rossberg · 2019-11-10T16:00:55Z

The rationale for the current semantics is that a dropped segment behaves like a zero-sized one and both failure conditions are essentially the same error category. Arguably, that is the simplest and most natural semantics, and jits to the simplest code. In fact, I was about to propose that we treat drop exactly like a shrink-to-zero, because that would get rid of having to track the dropped state as a separate piece of information, which is kind of annoying atm. (The only observable difference would be that dropping a segment twice would be a nop instead of a trap.)

So, hm, what you propose would make this simplification impossible.

Could you solve your use case by always keeping at least one byte for memory.fill? That may be slightly less elegant, but in practice, is it much different from doing a zero-sized fill?

Another question is why you even care about maintaining the error. The trap is a fatal failure, i.e., the change in semantics only affects broken programs. Does that matter for your use case?

Tangential question: Why only replace runs of zeroes? Wouldn't the same optimisation apply for any byte value?

tlively · 2019-11-11T01:49:59Z

Could you solve your use case by always keeping at least one byte for memory.fill? That may be slightly less elegant, but in practice, is it much different from doing a zero-sized fill?

This is certainly possible, but would be an extra complication in a pass that is already relatively complicated. I see the advantages for spec simplicity of specifying dropping as shrinking to zero, but in practice that will not make code any simpler because it simply moves the statefulness elsewhere. The change I proposed is a similar spec simplification although it leads to a different semantics. The cost of my proposed change is that engines would have to track dropped state explicitly instead of implementing drops as resizes to zero, but that seems a small cost to pay.

Basically the choice of simplification to make comes down to a decision about whether drop state should be kept explicit or should be implicit in the length of the segment.

Another question is why you even care about maintaining the error. The trap is a fatal failure, i.e., the change in semantics only affects broken programs. Does that matter for your use case?

Binaryen optimizations in general try to preserve program semantics, including trapping behavior. Binaryen has a flag for ignoring implicit traps, but it is not on by default because we have found it to be too unsafe for general use. There is perhaps room to adjust the default policy in safe way, but that would be a broader and separate discussion.

Tangential question: Why only replace runs of zeroes? Wouldn't the same optimisation apply for any byte value?

Yes, it certainly would. I'm starting with zeroes because this is a pre-existing optimization pass that similarly splits active segments around runs of zeroes, and for active segments only zeroes make sense.

rossberg · 2019-11-12T17:12:05Z

@tlively:

I see the advantages for spec simplicity of specifying dropping as shrinking to zero, but in practice that will not make code any simpler because it simply moves the statefulness elsewhere. The change I proposed is a similar spec simplification although it leads to a different semantics.

I don't think that's correct. As you observe, what I propose would simplify the code that engines have to generate and the state they have to maintain: there would be only one variable and one condition to check. It removes a case. In contrast, your proposal adds a case.

How bad is the complication to the optimisation? Naively, it doesn't seem that hard to stop at 1 vs 0.

tlively · 2019-11-12T22:40:57Z

In the following tables zero-length implies "in bounds." The current cases:

in bounds?	dropped?	zero-length?	Result
✔️	✔️	✔️	OK
✔️	✔️	❌	trap
✔️	❌	✔️	OK
✔️	❌	❌	OK
❌	✔️	❌	trap
❌	❌	❌	trap

If zero-length memory.inits are allowed to trap, this becomes:

in bounds?	dropped?	zero-length?	Result
✔️	✔️	✔️	trap
✔️	✔️	❌	trap
✔️	❌	✔️	OK
✔️	❌	❌	OK
❌	✔️	❌	trap
❌	❌	❌	trap

But the first two columns contain enough information to determine the result, so this simplifies to:

in bounds?	dropped?	Result
✔️	✔️	trap
✔️	❌	OK
❌	✔️	trap
❌	❌	trap

So I think it is indisputable that allowing zero-length memory.inits to trap yields simpler semantics, but I acknowledge that there is a real trade off with spec complexity here and that these simpler semantics would require engines to separately track the dropped state. I would be interested in hearing what other people think as well, particularly engine implementors. If engine implementors prefer the truncation approach, I would be happy to go with that because I think ultimately there will be more engines than tools that have to consider these edge cases.

eqrion · 2019-11-13T16:44:52Z

If I understand the discussion here, I don't believe 'data.drop shrinks segment to 0' would simplify SpiderMonkey's implementation. We use refcounted immutable data-segments between threads and so we cannot mutate the byte vector to drop it. Meaning we'll always need to check if the data-segment has been dropped before looking at the length of it.

I do think that giving precedence to zero-sized input over dropped state in determining trapping might be convenient in the future for implementations to not emit code if they determine from analysis that len=0. Whereas they'd have to emit a stub check for if the segment was dropped and trap, but perform no copy.

Today we just use an OOL call with no analysis, so either way that's chosen would only involve flipping the order of two if statements. I don't think either way is significantly better, so I don't have a strong opinion.

rossberg · 2019-11-14T10:02:31Z

@tlively, if I interpret your argument correctly then in fact, what I am saying is that simplifying dropping to shrink-to-zero turns the table into the bare minimum:

in bounds?	Result
✔️	OK
❌	trap

@eqrion, you wouldn't mutate the byte vector itself, you would mutate the pointer to point to a singleton zero-sized one (instead of nulling it). Dropping itself is pretty much the same, but you save an extra null check in every use of the segment.

tlively · 2019-11-14T17:26:42Z

@rossberg Comparing that logic table with the previous logic tables is apples to oranges because all the complexity has been hidden in computing the bounds, and in particular you have made the segment length a mutable property. Perhaps I should have formalized the columns as predicates over the initial conditions and a program trace to prevent the complexity from being hidden by changed definitions.

rossberg · 2019-11-15T11:43:51Z

@tlively, possibly, I wasn’t sure how to exactly interpret your tables, or why the a column can be elided in the end.

To clarify, with the semantics I propose the code generated for an init would be

if len = 0, do nothing
else if offset+len > segment.len, trap
else, copy

If you need to distinguish dropped segments it will have to be

if segment.dropped, trap
else if len = 0, do nothing
else if offset+len > segment.len, trap
else, copy

That is, one extra check and error case and one extra state flag.

In fact, now that we have reintroduced the upfront bounds check I wonder whether special-casing zero is still worth it. We could go to the bare minimum:

if offset+len > segment.len, trap
else, copy

tlively · 2019-11-15T17:25:12Z

Interesting! Would a memory.init with zero offset and zero length of a dropped segment trap? I would prefer that it does, but my guess is that you would prefer that it doesn’t.

Due to other decisions, my work on Binaryen is no longer blocked on resolving this issue. We both agree that there is room to simplify the current memory.init semantics and the details are no longer so important to me. I will be happy as long as some simplification occurs.

eqrion · 2019-11-15T18:04:40Z

@eqrion, you wouldn't mutate the byte vector itself, you would mutate the pointer to point to a singleton zero-sized one (instead of nulling it). Dropping itself is pretty much the same, but you save an extra null check in every use of the segment.

Ah interesting. Those semantics make sense and would simplify the code.

In fact, now that we have reintroduced the upfront bounds check I wonder whether special-casing zero is still worth it. We could go to the bare minimum:
1. if offset+len > segment.len, trap

2. else, copy

I wasn't around for the decision to special case zero. Was there a code perf reason for this or was it just for reducing spec complexity? Maybe @lars-t-hansen

rossberg · 2019-11-16T03:37:11Z

@tlively, yes! I just realised lying awake in bed that my further simplification would actually give you what you want! So we should propose that.

@eqrion, it was me who proposed that the null case wouldn't throw, because it was a simplification at the time. However, that was basically turned around by our recent decision to revert to check OOB beforehand, because that check now needs an exception for 0-sized. So my suggestion now is to also revert the 0-size behaviour and regain overall consistency again.

eqrion · 2019-11-20T22:42:35Z

@rossberg I talked about this with @lars-t-hansen today and I think we're both okay with this change.

How much consensus would we need to make this change?

rossberg · 2019-11-21T07:50:06Z

Okay, it sounds like the main stake holders are on board. In that case, we could probably go ahead and create a PR -- I'm happy do the implementation and spec, but might appreciate some help with the test case generation. (We could instead bring it up at a CG meeting first. But unfortunately, I will not be able to attend the next one.)

rossberg · 2019-11-21T08:46:19Z

Okay, I started a PR: #126.

rossberg · 2019-11-23T09:25:45Z

#126 landed. @tlively, does that mean that this issue can be closed?

tlively · 2019-11-23T09:40:03Z

Yes, thanks for a productive discussion 👍

…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599 --HG-- extra : moz-landing-system : lando

…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599

…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599 UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63

According to changes in spec: WebAssembly/bulk-memory-operations#124 WebAssembly/bulk-memory-operations#145 we unfortunately can't fold to nop even for memory.copy(x, y, 0). So this PR revert all reductions to nop but do this only under ignoreImplicitTraps flag

tlively mentioned this issue Nov 8, 2019

Optimize passive segments in memory-packing WebAssembly/binaryen#2426

Merged

rossberg mentioned this issue Nov 21, 2019

[interpreter] Simplify zero-len and drop semantics #126

Merged

tlively closed this as completed Nov 23, 2019

eqrion mentioned this issue Nov 27, 2019

Advance bulk-memory-operations to phase 4 #132

Closed

conrad-watt mentioned this issue Dec 15, 2019

MemoryPacking changes d8 results WebAssembly/binaryen#2528

Closed

eqrion mentioned this issue Mar 22, 2020

Trapping on zero-length writes? #145

Closed

conrad-watt mentioned this issue Aug 24, 2020

Remove optimization for memory.copy(x, x, C) WebAssembly/binaryen#3073

Merged

MaxGraey mentioned this issue Aug 24, 2020

memory.copy: use nop reductions only for ignoreImplicitTraps WebAssembly/binaryen#3074

Merged

sunfishcode mentioned this issue May 24, 2022

Clarifying semantics around 0-length arrays WebAssembly/component-model#32

Closed

Luukdegram mentioned this issue Jun 2, 2023

@memcpy panics on shared-memory WASM ziglang/zig#15920

Closed

Luukdegram mentioned this issue Jul 7, 2023

Emit check for memory intrinsics for WebAssembly ziglang/zig#16345

Merged

conrad-watt mentioned this issue Jul 10, 2023

zero length bulk memory behavior incompatible with LLVM intrinsics WebAssembly/design#1482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zero-sized memory.init of dropped segments should trap #124

Zero-sized memory.init of dropped segments should trap #124

tlively commented Nov 8, 2019

rossberg commented Nov 10, 2019

Uh oh!

tlively commented Nov 11, 2019

Uh oh!

rossberg commented Nov 12, 2019

Uh oh!

tlively commented Nov 12, 2019

Uh oh!

eqrion commented Nov 13, 2019

Uh oh!

rossberg commented Nov 14, 2019 •

edited

Loading

Uh oh!

tlively commented Nov 14, 2019

Uh oh!

rossberg commented Nov 15, 2019

Uh oh!

tlively commented Nov 15, 2019

Uh oh!

eqrion commented Nov 15, 2019

Uh oh!

rossberg commented Nov 16, 2019

Uh oh!

eqrion commented Nov 20, 2019

Uh oh!

rossberg commented Nov 21, 2019

Uh oh!

rossberg commented Nov 21, 2019

Uh oh!

rossberg commented Nov 23, 2019

Uh oh!

tlively commented Nov 23, 2019 •

edited

Loading

Uh oh!

Zero-sized memory.init of dropped segments should trap #124

Zero-sized memory.init of dropped segments should trap #124

Comments

tlively commented Nov 8, 2019

rossberg commented Nov 10, 2019

Uh oh!

tlively commented Nov 11, 2019

Uh oh!

rossberg commented Nov 12, 2019

Uh oh!

tlively commented Nov 12, 2019

Uh oh!

eqrion commented Nov 13, 2019

Uh oh!

rossberg commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively commented Nov 14, 2019

Uh oh!

rossberg commented Nov 15, 2019

Uh oh!

tlively commented Nov 15, 2019

Uh oh!

eqrion commented Nov 15, 2019

Uh oh!

rossberg commented Nov 16, 2019

Uh oh!

eqrion commented Nov 20, 2019

Uh oh!

rossberg commented Nov 21, 2019

Uh oh!

rossberg commented Nov 21, 2019

Uh oh!

rossberg commented Nov 23, 2019

Uh oh!

tlively commented Nov 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossberg commented Nov 14, 2019 •

edited

Loading

tlively commented Nov 23, 2019 •

edited

Loading