-
Notifications
You must be signed in to change notification settings - Fork 27
Zero-sized memory.init of dropped segments should trap #124
Comments
The rationale for the current semantics is that a dropped segment behaves like a zero-sized one and both failure conditions are essentially the same error category. Arguably, that is the simplest and most natural semantics, and jits to the simplest code. In fact, I was about to propose that we treat drop exactly like a shrink-to-zero, because that would get rid of having to track the dropped state as a separate piece of information, which is kind of annoying atm. (The only observable difference would be that dropping a segment twice would be a nop instead of a trap.) So, hm, what you propose would make this simplification impossible. Could you solve your use case by always keeping at least one byte for memory.fill? That may be slightly less elegant, but in practice, is it much different from doing a zero-sized fill? Another question is why you even care about maintaining the error. The trap is a fatal failure, i.e., the change in semantics only affects broken programs. Does that matter for your use case? Tangential question: Why only replace runs of zeroes? Wouldn't the same optimisation apply for any byte value? |
This is certainly possible, but would be an extra complication in a pass that is already relatively complicated. I see the advantages for spec simplicity of specifying dropping as shrinking to zero, but in practice that will not make code any simpler because it simply moves the statefulness elsewhere. The change I proposed is a similar spec simplification although it leads to a different semantics. The cost of my proposed change is that engines would have to track dropped state explicitly instead of implementing drops as resizes to zero, but that seems a small cost to pay. Basically the choice of simplification to make comes down to a decision about whether drop state should be kept explicit or should be implicit in the length of the segment.
Binaryen optimizations in general try to preserve program semantics, including trapping behavior. Binaryen has a flag for ignoring implicit traps, but it is not on by default because we have found it to be too unsafe for general use. There is perhaps room to adjust the default policy in safe way, but that would be a broader and separate discussion.
Yes, it certainly would. I'm starting with zeroes because this is a pre-existing optimization pass that similarly splits active segments around runs of zeroes, and for active segments only zeroes make sense. |
I don't think that's correct. As you observe, what I propose would simplify the code that engines have to generate and the state they have to maintain: there would be only one variable and one condition to check. It removes a case. In contrast, your proposal adds a case. How bad is the complication to the optimisation? Naively, it doesn't seem that hard to stop at 1 vs 0. |
In the following tables zero-length implies "in bounds." The current cases:
If zero-length
But the first two columns contain enough information to determine the result, so this simplifies to:
So I think it is indisputable that allowing zero-length |
If I understand the discussion here, I don't believe 'data.drop shrinks segment to 0' would simplify SpiderMonkey's implementation. We use refcounted immutable data-segments between threads and so we cannot mutate the byte vector to drop it. Meaning we'll always need to check if the data-segment has been dropped before looking at the length of it. I do think that giving precedence to zero-sized input over dropped state in determining trapping might be convenient in the future for implementations to not emit code if they determine from analysis that len=0. Whereas they'd have to emit a stub check for if the segment was dropped and trap, but perform no copy. Today we just use an OOL call with no analysis, so either way that's chosen would only involve flipping the order of two if statements. I don't think either way is significantly better, so I don't have a strong opinion. |
@tlively, if I interpret your argument correctly then in fact, what I am saying is that simplifying dropping to shrink-to-zero turns the table into the bare minimum:
@eqrion, you wouldn't mutate the byte vector itself, you would mutate the pointer to point to a singleton zero-sized one (instead of nulling it). Dropping itself is pretty much the same, but you save an extra null check in every use of the segment. |
@rossberg Comparing that logic table with the previous logic tables is apples to oranges because all the complexity has been hidden in computing the bounds, and in particular you have made the segment length a mutable property. Perhaps I should have formalized the columns as predicates over the initial conditions and a program trace to prevent the complexity from being hidden by changed definitions. |
@tlively, possibly, I wasn’t sure how to exactly interpret your tables, or why the a column can be elided in the end. To clarify, with the semantics I propose the code generated for an init would be
If you need to distinguish dropped segments it will have to be
That is, one extra check and error case and one extra state flag. In fact, now that we have reintroduced the upfront bounds check I wonder whether special-casing zero is still worth it. We could go to the bare minimum:
|
Interesting! Would a memory.init with zero offset and zero length of a dropped segment trap? I would prefer that it does, but my guess is that you would prefer that it doesn’t. Due to other decisions, my work on Binaryen is no longer blocked on resolving this issue. We both agree that there is room to simplify the current memory.init semantics and the details are no longer so important to me. I will be happy as long as some simplification occurs. |
Ah interesting. Those semantics make sense and would simplify the code.
I wasn't around for the decision to special case zero. Was there a code perf reason for this or was it just for reducing spec complexity? Maybe @lars-t-hansen |
@tlively, yes! I just realised lying awake in bed that my further simplification would actually give you what you want! So we should propose that. @eqrion, it was me who proposed that the null case wouldn't throw, because it was a simplification at the time. However, that was basically turned around by our recent decision to revert to check OOB beforehand, because that check now needs an exception for 0-sized. So my suggestion now is to also revert the 0-size behaviour and regain overall consistency again. |
@rossberg I talked about this with @lars-t-hansen today and I think we're both okay with this change. How much consensus would we need to make this change? |
Okay, it sounds like the main stake holders are on board. In that case, we could probably go ahead and create a PR -- I'm happy do the implementation and spec, but might appreciate some help with the test case generation. (We could instead bring it up at a CG meeting first. But unfortunately, I will not be able to attend the next one.) |
Okay, I started a PR: #126. |
Yes, thanks for a productive discussion 👍 |
…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599 --HG-- extra : moz-landing-system : lando
…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599
…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599 UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63
…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599 UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63
…ength past end of bounds. r=lth Spec Issue: WebAssembly/bulk-memory-operations#124 The inline path for memory.copy/fill are updated to fallback to the OOL path when the length is 0 to have proper bounds checking behavior. Differential Revision: https://phabricator.services.mozilla.com/D54599 UltraBlame original commit: 81832b228e1684f421ffe1a0c2f2f3597afc6e63
According to changes in spec: WebAssembly/bulk-memory-operations#124 WebAssembly/bulk-memory-operations#145 we unfortunately can't fold to nop even for memory.copy(x, y, 0). So this PR revert all reductions to nop but do this only under ignoreImplicitTraps flag
I'm currently working on generalizing Binaryen's memory-packing pass to optimize passive segments in addition to the active segments it already optimizes. The main idea is that when there is a long run of zeroes in a data segment, that data segment is split into two segments that do not contains those zeroes and any
memory.init
instructions that reference that segment are split into a sequence ofmemory.init
andmemory.fill
instructions.The problem is that if a
memory.init
instruction uses a section of a data segment that is only comprised of zeroes, it naively would be replaced with a singlememory.fill
instruction. But that would not be a correct transformation because thememory.fill
instruction would not trap if the original data segment would have been dropped. Emulating the trapping behavior correctly would involve creating new globals to track the hidden state of each of the data segments that are split, which is unnecessarily complex. If zero-sizedmemory.init
instructions could trap, a simpler solution would be to insert a zero-sizememory.init
of one of the split data segments before thememory.fill
.My understanding is that zero-length
memory.init
currently does not trap as a matter of convenience, so would anyone object to changing this behavior?The text was updated successfully, but these errors were encountered: