[opt] store properties of stack alloc_ref in tuple #30810

zoecarver · 2020-04-04T19:29:17Z

As discussed in the above PRs the SIL optimizer is much better at optimizing tuples than classes. This patch moves all stored properties of a class into a tuple so that they are able to be optimized before being stored back into the class. If possible, the class is removed altogether.

This is only one of the many possible ways to implement this optimization. If we decide that this is the path we want to take, I will add more extensive tests.

If possible, create a tuple that holds the stored properties of a class and replace as many uses of the class as possible with the tuple. Then store the tuple elements into the class properties. This allows both LLVM and the SIL optimizer to produce better codegen.

…s are replaced

zoecarver · 2020-04-04T19:30:05Z

@swift-ci please benchmark

* Ref element addrs that had uses outside "firstUnknonwUse" would still be replaced. * Some strong releases need to be removed when erasing the alloc_ref.

zoecarver · 2020-04-05T21:36:10Z

@swift-ci please benchmark

zoecarver · 2020-04-05T21:37:03Z

@swift-ci please benchmark

swift-ci · 2020-04-05T22:05:03Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
EqualSubstringSubstring	23	30	+30.4%	0.77x
LessSubstringSubstring	23	30	+30.4%	0.77x
EqualStringSubstring	23	30	+30.4%	0.77x
EqualSubstringSubstringGenericEquatable	23	30	+30.4%	0.77x
EqualSubstringString	23	30	+30.4%	0.77x
LessSubstringSubstringGenericComparable	23	30	+30.4%	0.77x
RecursiveOwnedParameter	54	65	+20.4%	0.83x
PrefixSequenceLazy	22	26	+18.2%	0.85x
PrefixSequence	22	26	+18.2%	0.85x
ObjectiveCBridgeStringHash	72	84	+16.7%	0.86x
StringComparison_longSharedPrefix	321	358	+11.5%	0.90x (?)
ReversedDictionary2	220	241	+9.5%	0.91x (?)
StringUTF16Builder	210	230	+9.5%	0.91x (?)
ObjectiveCBridgeStubToNSDate2	330	360	+9.1%	0.92x (?)
ObjectiveCBridgeStubToNSStringRef	68	74	+8.8%	0.92x (?)
ObjectiveCBridgeStringIsEqual	168	182	+8.3%	0.92x (?)
MapReduceLazyCollectionShort	26	28	+7.7%	0.93x

Improvement	OLD	NEW	DELTA	RATIO
PrefixAnySequenceLazy	1058	26	-97.5%	40.69x
PrefixWhileAnySequenceLazy	1059	40	-96.2%	26.47x
DropWhileAnySequenceLazy	1411	58	-95.9%	24.33x
DropWhileAnySequence	2131	1272	-40.3%	1.68x
SuffixAnyCollection	10	7	-30.0%	1.43x
DropLastAnyCollection	10	7	-30.0%	1.43x
PrefixAnySequence	1755	1352	-23.0%	1.30x
DropFirstAnySequence	2104	1674	-20.4%	1.26x (?)
PrefixWhileAnyCollection	24	20	-16.7%	1.20x (?)
PrefixWhileAnyCollectionLazy	25	21	-16.0%	1.19x
PrefixWhileAnySeqCRangeIterLazy	25	21	-16.0%	1.19x
PrefixWhileAnySeqCntRangeLazy	25	21	-16.0%	1.19x
DropFirstAnySequenceLazy	1376	1192	-13.4%	1.15x
PrefixAnyCollection	24	21	-12.5%	1.14x (?)
DropFirstAnyCollection	24	21	-12.5%	1.14x (?)
DropWhileAnyCollection	24	21	-12.5%	1.14x (?)
NormalizedIterator_fastPrenormal	680	610	-10.3%	1.11x
NormalizedIterator_latin1	230	212	-7.8%	1.08x (?)
AngryPhonebook.Strasse	155	143	-7.7%	1.08x (?)
PrefixWhileAnySequence	1164	1075	-7.6%	1.08x (?)
AngryPhonebook.Cyrillic	186	172	-7.5%	1.08x (?)
AngryPhonebook.Armenian	173	160	-7.5%	1.08x (?)

Code size: -O

Improvement	OLD	NEW	DELTA	RATIO
PrefixWhile.o	18212	15317	-15.9%	1.19x
DropWhile.o	18386	15643	-14.9%	1.18x
Prims.o	12875	12097	-6.0%	1.06x
Prefix.o	18608	17808	-4.3%	1.04x
DropFirst.o	19932	19124	-4.1%	1.04x
ObserverForwarderStruct.o	2852	2772	-2.8%	1.03x
ObserverUnappliedMethod.o	5089	5009	-1.6%	1.02x
RecursiveOwnedParameter.o	2270	2238	-1.4%	1.01x
ObserverClosure.o	2478	2446	-1.3%	1.01x
ObserverPartiallyAppliedMethod.o	2525	2493	-1.3%	1.01x
Hanoi.o	3120	3088	-1.0%	1.01x

zoecarver · 2020-04-05T22:16:29Z

The observer benchmarks seemed to have huge improvements in #30743 but I don't see them here. I'll see if I can figure out why.

I think most of the regression comes from alloc_refs with tail elements. Those don't seem to have much if any benefit from this optimization and get hurt (as seen in the substring tests especially). I'll try skipping those and see what the impact is.

#30812 and this patch together will be really great.

It seems like this may be causing some regressions without gains in any cases so, skipping those alloc_refs is benefitial.

zoecarver · 2020-04-06T18:27:30Z

@swift-ci please benchmark

zoecarver · 2020-04-06T18:27:54Z

@swift-ci please benchmark

swift-ci · 2020-04-06T18:47:57Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
EqualSubstringSubstring	23	30	+30.4%	0.77x
LessSubstringSubstring	23	30	+30.4%	0.77x
EqualStringSubstring	23	30	+30.4%	0.77x
EqualSubstringSubstringGenericEquatable	23	30	+30.4%	0.77x
EqualSubstringString	23	30	+30.4%	0.77x
LessSubstringSubstringGenericComparable	23	30	+30.4%	0.77x
RecursiveOwnedParameter	54	65	+20.4%	0.83x
PrefixSequenceLazy	22	26	+18.2%	0.85x
PrefixSequence	22	26	+18.2%	0.85x
ObjectiveCBridgeStringHash	72	84	+16.7%	0.86x
StringComparison_longSharedPrefix	321	358	+11.5%	0.90x (?)
ReversedDictionary2	219	241	+10.0%	0.91x (?)
ObjectiveCBridgeStringGetASCIIContents	293	316	+7.8%	0.93x (?)
ObjectiveCBridgeStringIsEqual	169	182	+7.7%	0.93x (?)
MapReduceLazyCollectionShort	26	28	+7.7%	0.93x

Improvement	OLD	NEW	DELTA	RATIO
PrefixAnySequenceLazy	1058	26	-97.5%	40.69x
PrefixWhileAnySequenceLazy	1059	40	-96.2%	26.47x
DropWhileAnySequenceLazy	1411	58	-95.9%	24.33x
DropWhileAnySequence	2131	1352	-36.6%	1.58x
SuffixAnyCollection	10	7	-30.0%	1.43x
DropLastAnyCollection	10	7	-30.0%	1.43x
PrefixAnySequence	1688	1326	-21.4%	1.27x
DropFirstAnySequence	2104	1674	-20.4%	1.26x
PrefixWhileAnyCollection	24	20	-16.7%	1.20x
PrefixWhileAnyCollectionLazy	25	21	-16.0%	1.19x
PrefixWhileAnySeqCRangeIterLazy	25	21	-16.0%	1.19x (?)
PrefixWhileAnySeqCntRangeLazy	25	21	-16.0%	1.19x
PrefixAnyCollection	24	21	-12.5%	1.14x
DropFirstAnyCollection	24	21	-12.5%	1.14x (?)
DropWhileAnyCollection	24	21	-12.5%	1.14x
DropFirstAnySequenceLazy	1412	1237	-12.4%	1.14x (?)
NormalizedIterator_fastPrenormal	680	610	-10.3%	1.11x
Data.hash.Medium	33	30	-9.1%	1.10x (?)
AngryPhonebook.Strasse	155	142	-8.4%	1.09x (?)
NormalizedIterator_latin1	230	212	-7.8%	1.08x (?)
PrefixWhileAnySequence	1165	1074	-7.8%	1.08x (?)
AngryPhonebook.Armenian	173	160	-7.5%	1.08x (?)
ObjectiveCBridgeStubFromNSDateRef	3120	2890	-7.4%	1.08x (?)

Code size: -O

Improvement	OLD	NEW	DELTA	RATIO
PrefixWhile.o	18212	15317	-15.9%	1.19x
DropWhile.o	18386	15643	-14.9%	1.18x
Prims.o	12875	12097	-6.0%	1.06x
Prefix.o	18608	17808	-4.3%	1.04x
DropFirst.o	19932	19124	-4.1%	1.04x
ObserverForwarderStruct.o	2852	2772	-2.8%	1.03x
ObserverUnappliedMethod.o	5089	5009	-1.6%	1.02x
RecursiveOwnedParameter.o	2270	2238	-1.4%	1.01x
ObserverClosure.o	2478	2446	-1.3%	1.01x
ObserverPartiallyAppliedMethod.o	2525	2493	-1.3%	1.01x
Hanoi.o	3120	3088	-1.0%	1.01x

zoecarver · 2020-04-06T18:49:18Z

Welp. I guess that wasn't the solution 😕

Sorry for waisting buildbot resources.

zoecarver · 2020-04-13T23:59:20Z

@eeckstein sorry for all the messages. What do you think of the direction here? If you think I should go ahead with this patch then I can look into the build errors / tests.

eeckstein · 2020-04-14T15:02:08Z

@zoecarver I think your optimization makes sense. I still have to review the changes in detail.

zoecarver · 2020-04-14T15:49:43Z

@eeckstein no rush. I'll add some more tests/clean it up a bit.

eeckstein · 2020-04-15T14:38:32Z

@zoecarver You approach using the dominator tree is not working, because the dominance relation cannot be used to decide if one instruction is executed "before" another instruction.
For example:

  a
  if cond {
    b
  } else {
    c
  }
  d

All three blocks 'b', 'c' and 'd', are direct dominator children of 'a', which means it's not telling you that 'd' is executed after 'b' and 'c'.
Let's say that b would be the firstUnknownUse. It does not dominate 'd' although if cond is true, it will be executed before 'd'.

Even sorting (in getOrderedNonDebugUses) will not work because the dominance relation is not a strict order ('not a dom b' does not imply 'b dom a').

What you would need to solve this problem is a data flow analysis.

Thinking about it again, I released that we already have this kind of optimization: it's redundant load elimination and dead store elimination. These optimizations are doing the data flow analysis required to solve the problem. The only difference is, they are not grouping the object content into a tuple, but use a separate SSA value for each property.

I tried a simple example and found: it does not work! Looks like that access markers (begin_access, end_access) prevent redundant load elimination. We should fix this.

PS: Please don't be discouraged by my review comments. I really like that you are working on the optimizer!

zoecarver · 2020-04-15T23:12:20Z

@eeckstein that's good to know. Thanks for all that information. I didn't realize load/store elimination already did this, I agree, updating those will be a much better solution than this patch. Which is kind of a hack.

I was running into the begin/end access issue in this patch too, that's why I created #30812.

I don't think we ever completely remove alloc_refs even if they're not used. If that's true, I'll try to fix that too because otherwise, their lifetime functions get in the way of llvm optimizations.

zoecarver · 2020-04-15T23:17:10Z

Just so I make sure I understand what you're saying with the example you gave, this implementation won't ever cause miscompiles but, it might miss some things that it could optimize, right? In other words, using dominance info in the way I am is "safe" but misses some cases?

zoecarver · 2020-04-16T18:21:38Z

@eeckstein fixed in #31078 :)

eeckstein · 2020-04-17T09:17:38Z

To answer you question: no, this implementation would cause miscompiles (most likely)

zoecarver added 3 commits April 4, 2020 11:20

Cleanup & comment tryPromoteToObject

3cea7d1

Add stack promotion test to test that alloc_refs are removed and user…

ce0270e

…s are replaced

zoecarver requested a review from eeckstein April 4, 2020 19:29

zoecarver mentioned this pull request Apr 4, 2020

[opt] Teach mem2reg about begin_access. #30812

Closed

Fix issues exposed by benchmarks:

3886bd8

* Ref element addrs that had uses outside "firstUnknonwUse" would still be replaced. * Some strong releases need to be removed when erasing the alloc_ref.

This was referenced Apr 6, 2020

[sil] add instruction copy_to_ref #30787

Closed

[opt] promote stack alloc_ref to object #30743

Closed

Skip alloc_ref with tail elements

5ca96d9

It seems like this may be causing some regressions without gains in any cases so, skipping those alloc_refs is benefitial.

zoecarver closed this Apr 16, 2020

[opt] store properties of stack alloc_ref in tuple #30810

[opt] store properties of stack alloc_ref in tuple #30810

Uh oh!

Conversation

zoecarver commented Apr 4, 2020

Uh oh!

zoecarver commented Apr 4, 2020

Uh oh!

zoecarver commented Apr 5, 2020

Uh oh!

zoecarver commented Apr 5, 2020

Uh oh!

swift-ci commented Apr 5, 2020

Performance: -O

Code size: -O

Uh oh!

zoecarver commented Apr 5, 2020

Uh oh!

zoecarver commented Apr 6, 2020

Uh oh!

zoecarver commented Apr 6, 2020

Uh oh!

swift-ci commented Apr 6, 2020

Performance: -O

Code size: -O

Uh oh!

zoecarver commented Apr 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoecarver commented Apr 13, 2020

Uh oh!

eeckstein commented Apr 14, 2020

Uh oh!

zoecarver commented Apr 14, 2020

Uh oh!

eeckstein commented Apr 15, 2020

Uh oh!

zoecarver commented Apr 15, 2020

Uh oh!

zoecarver commented Apr 15, 2020

Uh oh!

zoecarver commented Apr 16, 2020

Uh oh!

eeckstein commented Apr 17, 2020

Uh oh!

Uh oh!

zoecarver commented Apr 6, 2020 •

edited

Loading