Skip to content

[opt] store properties of stack alloc_ref in tuple #30810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

zoecarver
Copy link
Contributor

Refs #30736 #30787 #30743

As discussed in the above PRs the SIL optimizer is much better at optimizing tuples than classes. This patch moves all stored properties of a class into a tuple so that they are able to be optimized before being stored back into the class. If possible, the class is removed altogether.

This is only one of the many possible ways to implement this optimization. If we decide that this is the path we want to take, I will add more extensive tests.

If possible, create a tuple that holds the stored properties of a class
and replace as many uses of the class as possible with the tuple. Then
store the tuple elements into the class properties. This allows both
LLVM and the SIL optimizer to produce better codegen.
@zoecarver zoecarver requested a review from eeckstein April 4, 2020 19:29
@zoecarver
Copy link
Contributor Author

@swift-ci please benchmark

  * Ref element addrs that had uses outside "firstUnknonwUse" would
  still be replaced.
  * Some strong releases need to be removed when erasing the alloc_ref.
@zoecarver
Copy link
Contributor Author

@swift-ci please benchmark

1 similar comment
@zoecarver
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Apr 5, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
EqualSubstringSubstring 23 30 +30.4% 0.77x
LessSubstringSubstring 23 30 +30.4% 0.77x
EqualStringSubstring 23 30 +30.4% 0.77x
EqualSubstringSubstringGenericEquatable 23 30 +30.4% 0.77x
EqualSubstringString 23 30 +30.4% 0.77x
LessSubstringSubstringGenericComparable 23 30 +30.4% 0.77x
RecursiveOwnedParameter 54 65 +20.4% 0.83x
PrefixSequenceLazy 22 26 +18.2% 0.85x
PrefixSequence 22 26 +18.2% 0.85x
ObjectiveCBridgeStringHash 72 84 +16.7% 0.86x
StringComparison_longSharedPrefix 321 358 +11.5% 0.90x (?)
ReversedDictionary2 220 241 +9.5% 0.91x (?)
StringUTF16Builder 210 230 +9.5% 0.91x (?)
ObjectiveCBridgeStubToNSDate2 330 360 +9.1% 0.92x (?)
ObjectiveCBridgeStubToNSStringRef 68 74 +8.8% 0.92x (?)
ObjectiveCBridgeStringIsEqual 168 182 +8.3% 0.92x (?)
MapReduceLazyCollectionShort 26 28 +7.7% 0.93x
 
Improvement OLD NEW DELTA RATIO
PrefixAnySequenceLazy 1058 26 -97.5% 40.69x
PrefixWhileAnySequenceLazy 1059 40 -96.2% 26.47x
DropWhileAnySequenceLazy 1411 58 -95.9% 24.33x
DropWhileAnySequence 2131 1272 -40.3% 1.68x
SuffixAnyCollection 10 7 -30.0% 1.43x
DropLastAnyCollection 10 7 -30.0% 1.43x
PrefixAnySequence 1755 1352 -23.0% 1.30x
DropFirstAnySequence 2104 1674 -20.4% 1.26x (?)
PrefixWhileAnyCollection 24 20 -16.7% 1.20x (?)
PrefixWhileAnyCollectionLazy 25 21 -16.0% 1.19x
PrefixWhileAnySeqCRangeIterLazy 25 21 -16.0% 1.19x
PrefixWhileAnySeqCntRangeLazy 25 21 -16.0% 1.19x
DropFirstAnySequenceLazy 1376 1192 -13.4% 1.15x
PrefixAnyCollection 24 21 -12.5% 1.14x (?)
DropFirstAnyCollection 24 21 -12.5% 1.14x (?)
DropWhileAnyCollection 24 21 -12.5% 1.14x (?)
NormalizedIterator_fastPrenormal 680 610 -10.3% 1.11x
NormalizedIterator_latin1 230 212 -7.8% 1.08x (?)
AngryPhonebook.Strasse 155 143 -7.7% 1.08x (?)
PrefixWhileAnySequence 1164 1075 -7.6% 1.08x (?)
AngryPhonebook.Cyrillic 186 172 -7.5% 1.08x (?)
AngryPhonebook.Armenian 173 160 -7.5% 1.08x (?)

Code size: -O

Improvement OLD NEW DELTA RATIO
PrefixWhile.o 18212 15317 -15.9% 1.19x
DropWhile.o 18386 15643 -14.9% 1.18x
Prims.o 12875 12097 -6.0% 1.06x
Prefix.o 18608 17808 -4.3% 1.04x
DropFirst.o 19932 19124 -4.1% 1.04x
ObserverForwarderStruct.o 2852 2772 -2.8% 1.03x
ObserverUnappliedMethod.o 5089 5009 -1.6% 1.02x
RecursiveOwnedParameter.o 2270 2238 -1.4% 1.01x
ObserverClosure.o 2478 2446 -1.3% 1.01x
ObserverPartiallyAppliedMethod.o 2525 2493 -1.3% 1.01x
Hanoi.o 3120 3088 -1.0% 1.01x

@zoecarver
Copy link
Contributor Author

The observer benchmarks seemed to have huge improvements in #30743 but I don't see them here. I'll see if I can figure out why.

I think most of the regression comes from alloc_refs with tail elements. Those don't seem to have much if any benefit from this optimization and get hurt (as seen in the substring tests especially). I'll try skipping those and see what the impact is.

#30812 and this patch together will be really great.

It seems like this may be causing some regressions without gains in any
cases so, skipping those alloc_refs is benefitial.
@zoecarver
Copy link
Contributor Author

@swift-ci please benchmark

1 similar comment
@zoecarver
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Apr 6, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
EqualSubstringSubstring 23 30 +30.4% 0.77x
LessSubstringSubstring 23 30 +30.4% 0.77x
EqualStringSubstring 23 30 +30.4% 0.77x
EqualSubstringSubstringGenericEquatable 23 30 +30.4% 0.77x
EqualSubstringString 23 30 +30.4% 0.77x
LessSubstringSubstringGenericComparable 23 30 +30.4% 0.77x
RecursiveOwnedParameter 54 65 +20.4% 0.83x
PrefixSequenceLazy 22 26 +18.2% 0.85x
PrefixSequence 22 26 +18.2% 0.85x
ObjectiveCBridgeStringHash 72 84 +16.7% 0.86x
StringComparison_longSharedPrefix 321 358 +11.5% 0.90x (?)
ReversedDictionary2 219 241 +10.0% 0.91x (?)
ObjectiveCBridgeStringGetASCIIContents 293 316 +7.8% 0.93x (?)
ObjectiveCBridgeStringIsEqual 169 182 +7.7% 0.93x (?)
MapReduceLazyCollectionShort 26 28 +7.7% 0.93x
 
Improvement OLD NEW DELTA RATIO
PrefixAnySequenceLazy 1058 26 -97.5% 40.69x
PrefixWhileAnySequenceLazy 1059 40 -96.2% 26.47x
DropWhileAnySequenceLazy 1411 58 -95.9% 24.33x
DropWhileAnySequence 2131 1352 -36.6% 1.58x
SuffixAnyCollection 10 7 -30.0% 1.43x
DropLastAnyCollection 10 7 -30.0% 1.43x
PrefixAnySequence 1688 1326 -21.4% 1.27x
DropFirstAnySequence 2104 1674 -20.4% 1.26x
PrefixWhileAnyCollection 24 20 -16.7% 1.20x
PrefixWhileAnyCollectionLazy 25 21 -16.0% 1.19x
PrefixWhileAnySeqCRangeIterLazy 25 21 -16.0% 1.19x (?)
PrefixWhileAnySeqCntRangeLazy 25 21 -16.0% 1.19x
PrefixAnyCollection 24 21 -12.5% 1.14x
DropFirstAnyCollection 24 21 -12.5% 1.14x (?)
DropWhileAnyCollection 24 21 -12.5% 1.14x
DropFirstAnySequenceLazy 1412 1237 -12.4% 1.14x (?)
NormalizedIterator_fastPrenormal 680 610 -10.3% 1.11x
Data.hash.Medium 33 30 -9.1% 1.10x (?)
AngryPhonebook.Strasse 155 142 -8.4% 1.09x (?)
NormalizedIterator_latin1 230 212 -7.8% 1.08x (?)
PrefixWhileAnySequence 1165 1074 -7.8% 1.08x (?)
AngryPhonebook.Armenian 173 160 -7.5% 1.08x (?)
ObjectiveCBridgeStubFromNSDateRef 3120 2890 -7.4% 1.08x (?)

Code size: -O

Improvement OLD NEW DELTA RATIO
PrefixWhile.o 18212 15317 -15.9% 1.19x
DropWhile.o 18386 15643 -14.9% 1.18x
Prims.o 12875 12097 -6.0% 1.06x
Prefix.o 18608 17808 -4.3% 1.04x
DropFirst.o 19932 19124 -4.1% 1.04x
ObserverForwarderStruct.o 2852 2772 -2.8% 1.03x
ObserverUnappliedMethod.o 5089 5009 -1.6% 1.02x
RecursiveOwnedParameter.o 2270 2238 -1.4% 1.01x
ObserverClosure.o 2478 2446 -1.3% 1.01x
ObserverPartiallyAppliedMethod.o 2525 2493 -1.3% 1.01x
Hanoi.o 3120 3088 -1.0% 1.01x

@zoecarver
Copy link
Contributor Author

zoecarver commented Apr 6, 2020

Welp. I guess that wasn't the solution 😕

Sorry for waisting buildbot resources.

@zoecarver
Copy link
Contributor Author

@eeckstein sorry for all the messages. What do you think of the direction here? If you think I should go ahead with this patch then I can look into the build errors / tests.

@eeckstein
Copy link
Contributor

@zoecarver I think your optimization makes sense. I still have to review the changes in detail.

@zoecarver
Copy link
Contributor Author

@eeckstein no rush. I'll add some more tests/clean it up a bit.

@eeckstein
Copy link
Contributor

@zoecarver You approach using the dominator tree is not working, because the dominance relation cannot be used to decide if one instruction is executed "before" another instruction.
For example:

  a
  if cond {
    b
  } else {
    c
  }
  d

All three blocks 'b', 'c' and 'd', are direct dominator children of 'a', which means it's not telling you that 'd' is executed after 'b' and 'c'.
Let's say that b would be the firstUnknownUse. It does not dominate 'd' although if cond is true, it will be executed before 'd'.

Even sorting (in getOrderedNonDebugUses) will not work because the dominance relation is not a strict order ('not a dom b' does not imply 'b dom a').

What you would need to solve this problem is a data flow analysis.

Thinking about it again, I released that we already have this kind of optimization: it's redundant load elimination and dead store elimination. These optimizations are doing the data flow analysis required to solve the problem. The only difference is, they are not grouping the object content into a tuple, but use a separate SSA value for each property.

I tried a simple example and found: it does not work! Looks like that access markers (begin_access, end_access) prevent redundant load elimination. We should fix this.

PS: Please don't be discouraged by my review comments. I really like that you are working on the optimizer!

@zoecarver
Copy link
Contributor Author

@eeckstein that's good to know. Thanks for all that information. I didn't realize load/store elimination already did this, I agree, updating those will be a much better solution than this patch. Which is kind of a hack.

I was running into the begin/end access issue in this patch too, that's why I created #30812.

I don't think we ever completely remove alloc_refs even if they're not used. If that's true, I'll try to fix that too because otherwise, their lifetime functions get in the way of llvm optimizations.

@zoecarver
Copy link
Contributor Author

Just so I make sure I understand what you're saying with the example you gave, this implementation won't ever cause miscompiles but, it might miss some things that it could optimize, right? In other words, using dominance info in the way I am is "safe" but misses some cases?

@zoecarver
Copy link
Contributor Author

@eeckstein fixed in #31078 :)

@zoecarver zoecarver closed this Apr 16, 2020
@eeckstein
Copy link
Contributor

To answer you question: no, this implementation would cause miscompiles (most likely)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants