Skip to content
This repository was archived by the owner on Nov 3, 2021. It is now read-only.

Zero-sized memory.init of dropped segments should trap #124

Closed
tlively opened this issue Nov 8, 2019 · 16 comments
Closed

Zero-sized memory.init of dropped segments should trap #124

tlively opened this issue Nov 8, 2019 · 16 comments

Comments

@tlively
Copy link
Member

tlively commented Nov 8, 2019

I'm currently working on generalizing Binaryen's memory-packing pass to optimize passive segments in addition to the active segments it already optimizes. The main idea is that when there is a long run of zeroes in a data segment, that data segment is split into two segments that do not contains those zeroes and any memory.init instructions that reference that segment are split into a sequence of memory.init and memory.fill instructions.

The problem is that if a memory.init instruction uses a section of a data segment that is only comprised of zeroes, it naively would be replaced with a single memory.fill instruction. But that would not be a correct transformation because the memory.fill instruction would not trap if the original data segment would have been dropped. Emulating the trapping behavior correctly would involve creating new globals to track the hidden state of each of the data segments that are split, which is unnecessarily complex. If zero-sized memory.init instructions could trap, a simpler solution would be to insert a zero-size memory.init of one of the split data segments before the memory.fill.

My understanding is that zero-length memory.init currently does not trap as a matter of convenience, so would anyone object to changing this behavior?

@rossberg
Copy link
Member

The rationale for the current semantics is that a dropped segment behaves like a zero-sized one and both failure conditions are essentially the same error category. Arguably, that is the simplest and most natural semantics, and jits to the simplest code. In fact, I was about to propose that we treat drop exactly like a shrink-to-zero, because that would get rid of having to track the dropped state as a separate piece of information, which is kind of annoying atm. (The only observable difference would be that dropping a segment twice would be a nop instead of a trap.)

So, hm, what you propose would make this simplification impossible.

Could you solve your use case by always keeping at least one byte for memory.fill? That may be slightly less elegant, but in practice, is it much different from doing a zero-sized fill?

Another question is why you even care about maintaining the error. The trap is a fatal failure, i.e., the change in semantics only affects broken programs. Does that matter for your use case?

Tangential question: Why only replace runs of zeroes? Wouldn't the same optimisation apply for any byte value?

@tlively
Copy link
Member Author

tlively commented Nov 11, 2019

Could you solve your use case by always keeping at least one byte for memory.fill? That may be slightly less elegant, but in practice, is it much different from doing a zero-sized fill?

This is certainly possible, but would be an extra complication in a pass that is already relatively complicated. I see the advantages for spec simplicity of specifying dropping as shrinking to zero, but in practice that will not make code any simpler because it simply moves the statefulness elsewhere. The change I proposed is a similar spec simplification although it leads to a different semantics. The cost of my proposed change is that engines would have to track dropped state explicitly instead of implementing drops as resizes to zero, but that seems a small cost to pay.

Basically the choice of simplification to make comes down to a decision about whether drop state should be kept explicit or should be implicit in the length of the segment.

Another question is why you even care about maintaining the error. The trap is a fatal failure, i.e., the change in semantics only affects broken programs. Does that matter for your use case?

Binaryen optimizations in general try to preserve program semantics, including trapping behavior. Binaryen has a flag for ignoring implicit traps, but it is not on by default because we have found it to be too unsafe for general use. There is perhaps room to adjust the default policy in safe way, but that would be a broader and separate discussion.

Tangential question: Why only replace runs of zeroes? Wouldn't the same optimisation apply for any byte value?

Yes, it certainly would. I'm starting with zeroes because this is a pre-existing optimization pass that similarly splits active segments around runs of zeroes, and for active segments only zeroes make sense.

@rossberg
Copy link
Member

@tlively:

I see the advantages for spec simplicity of specifying dropping as shrinking to zero, but in practice that will not make code any simpler because it simply moves the statefulness elsewhere. The change I proposed is a similar spec simplification although it leads to a different semantics.

I don't think that's correct. As you observe, what I propose would simplify the code that engines have to generate and the state they have to maintain: there would be only one variable and one condition to check. It removes a case. In contrast, your proposal adds a case.

How bad is the complication to the optimisation? Naively, it doesn't seem that hard to stop at 1 vs 0.

@tlively
Copy link
Member Author

tlively commented Nov 12, 2019

In the following tables zero-length implies "in bounds." The current cases:

in bounds? dropped? zero-length? Result
✔️ ✔️ ✔️ OK
✔️ ✔️ trap
✔️ ✔️ OK
✔️ OK
✔️ trap
trap

If zero-length memory.inits are allowed to trap, this becomes:

in bounds? dropped? zero-length? Result
✔️ ✔️ ✔️ trap
✔️ ✔️ trap
✔️ ✔️ OK
✔️ OK
✔️ trap
trap

But the first two columns contain enough information to determine the result, so this simplifies to:

in bounds? dropped? Result
✔️ ✔️ trap
✔️ OK
✔️ trap
trap

So I think it is indisputable that allowing zero-length memory.inits to trap yields simpler semantics, but I acknowledge that there is a real trade off with spec complexity here and that these simpler semantics would require engines to separately track the dropped state. I would be interested in hearing what other people think as well, particularly engine implementors. If engine implementors prefer the truncation approach, I would be happy to go with that because I think ultimately there will be more engines than tools that have to consider these edge cases.

@eqrion
Copy link
Contributor

eqrion commented Nov 13, 2019

If I understand the discussion here, I don't believe 'data.drop shrinks segment to 0' would simplify SpiderMonkey's implementation. We use refcounted immutable data-segments between threads and so we cannot mutate the byte vector to drop it. Meaning we'll always need to check if the data-segment has been dropped before looking at the length of it.

I do think that giving precedence to zero-sized input over dropped state in determining trapping might be convenient in the future for implementations to not emit code if they determine from analysis that len=0. Whereas they'd have to emit a stub check for if the segment was dropped and trap, but perform no copy.

Today we just use an OOL call with no analysis, so either way that's chosen would only involve flipping the order of two if statements. I don't think either way is significantly better, so I don't have a strong opinion.

@rossberg
Copy link
Member

rossberg commented Nov 14, 2019

@tlively, if I interpret your argument correctly then in fact, what I am saying is that simplifying dropping to shrink-to-zero turns the table into the bare minimum:

in bounds? Result
✔️ OK
trap

@eqrion, you wouldn't mutate the byte vector itself, you would mutate the pointer to point to a singleton zero-sized one (instead of nulling it). Dropping itself is pretty much the same, but you save an extra null check in every use of the segment.

@tlively
Copy link
Member Author

tlively commented Nov 14, 2019

@rossberg Comparing that logic table with the previous logic tables is apples to oranges because all the complexity has been hidden in computing the bounds, and in particular you have made the segment length a mutable property. Perhaps I should have formalized the columns as predicates over the initial conditions and a program trace to prevent the complexity from being hidden by changed definitions.

@rossberg
Copy link
Member

@tlively, possibly, I wasn’t sure how to exactly interpret your tables, or why the a column can be elided in the end.

To clarify, with the semantics I propose the code generated for an init would be

  1. if len = 0, do nothing
  2. else if offset+len > segment.len, trap
  3. else, copy

If you need to distinguish dropped segments it will have to be

  1. if segment.dropped, trap
  2. else if len = 0, do nothing
  3. else if offset+len > segment.len, trap
  4. else, copy

That is, one extra check and error case and one extra state flag.

In fact, now that we have reintroduced the upfront bounds check I wonder whether special-casing zero is still worth it. We could go to the bare minimum:

  1. if offset+len > segment.len, trap
  2. else, copy

@tlively
Copy link
Member Author

tlively commented Nov 15, 2019

Interesting! Would a memory.init with zero offset and zero length of a dropped segment trap? I would prefer that it does, but my guess is that you would prefer that it doesn’t.

Due to other decisions, my work on Binaryen is no longer blocked on resolving this issue. We both agree that there is room to simplify the current memory.init semantics and the details are no longer so important to me. I will be happy as long as some simplification occurs.

@eqrion
Copy link
Contributor

eqrion commented Nov 15, 2019

@eqrion, you wouldn't mutate the byte vector itself, you would mutate the pointer to point to a singleton zero-sized one (instead of nulling it). Dropping itself is pretty much the same, but you save an extra null check in every use of the segment.

Ah interesting. Those semantics make sense and would simplify the code.

In fact, now that we have reintroduced the upfront bounds check I wonder whether special-casing zero is still worth it. We could go to the bare minimum:

1. if offset+len > segment.len, trap

2. else, copy

I wasn't around for the decision to special case zero. Was there a code perf reason for this or was it just for reducing spec complexity? Maybe @lars-t-hansen

@rossberg
Copy link
Member

@tlively, yes! I just realised lying awake in bed that my further simplification would actually give you what you want! So we should propose that.

@eqrion, it was me who proposed that the null case wouldn't throw, because it was a simplification at the time. However, that was basically turned around by our recent decision to revert to check OOB beforehand, because that check now needs an exception for 0-sized. So my suggestion now is to also revert the 0-size behaviour and regain overall consistency again.

@eqrion
Copy link
Contributor

eqrion commented Nov 20, 2019

@rossberg I talked about this with @lars-t-hansen today and I think we're both okay with this change.

How much consensus would we need to make this change?

@rossberg
Copy link
Member

Okay, it sounds like the main stake holders are on board. In that case, we could probably go ahead and create a PR -- I'm happy do the implementation and spec, but might appreciate some help with the test case generation. (We could instead bring it up at a CG meeting first. But unfortunately, I will not be able to attend the next one.)

@rossberg
Copy link
Member

Okay, I started a PR: #126.

@rossberg
Copy link
Member

#126 landed. @tlively, does that mean that this issue can be closed?

@tlively
Copy link
Member Author

tlively commented Nov 23, 2019

Yes, thanks for a productive discussion 👍

@tlively tlively closed this as completed Nov 23, 2019
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Nov 27, 2019
…ength past end of bounds. r=lth

Spec Issue: WebAssembly/bulk-memory-operations#124

The inline path for memory.copy/fill are updated to fallback to the OOL path
when the length is 0 to have proper bounds checking behavior.

Differential Revision: https://phabricator.services.mozilla.com/D54599

--HG--
extra : moz-landing-system : lando
xeonchen pushed a commit to xeonchen/gecko that referenced this issue Nov 27, 2019
…ength past end of bounds. r=lth

Spec Issue: WebAssembly/bulk-memory-operations#124

The inline path for memory.copy/fill are updated to fallback to the OOL path
when the length is 0 to have proper bounds checking behavior.

Differential Revision: https://phabricator.services.mozilla.com/D54599
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Nov 28, 2019
…ength past end of bounds. r=lth

Spec Issue: WebAssembly/bulk-memory-operations#124

The inline path for memory.copy/fill are updated to fallback to the OOL path
when the length is 0 to have proper bounds checking behavior.

Differential Revision: https://phabricator.services.mozilla.com/D54599

UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Nov 28, 2019
…ength past end of bounds. r=lth

Spec Issue: WebAssembly/bulk-memory-operations#124

The inline path for memory.copy/fill are updated to fallback to the OOL path
when the length is 0 to have proper bounds checking behavior.

Differential Revision: https://phabricator.services.mozilla.com/D54599

UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Nov 28, 2019
…ength past end of bounds. r=lth

Spec Issue: WebAssembly/bulk-memory-operations#124

The inline path for memory.copy/fill are updated to fallback to the OOL path
when the length is 0 to have proper bounds checking behavior.

Differential Revision: https://phabricator.services.mozilla.com/D54599

UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63
kripken pushed a commit to WebAssembly/binaryen that referenced this issue Aug 24, 2020
According to changes in spec:
WebAssembly/bulk-memory-operations#124
WebAssembly/bulk-memory-operations#145

we unfortunately can't fold to nop even for memory.copy(x, y, 0).

So this PR revert all reductions to nop but do this only under ignoreImplicitTraps flag
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants