[Flang] Fix performance issue in 549.fotonik3d_r #58303

kiranchandramohan · 2022-10-11T21:52:13Z

fotonik benchmark performs poorly with Flang. The performance is almost three times slower when compared to other compilers. Investigate the performance issue.

The text was updated successfully, but these errors were encountered:

Leporacanthicus · 2022-10-19T16:02:16Z

So the single sentence answer to this is "we're lacking alias analysis".

Here's a few more words. If someone really wants to have further details, please feel free to ask, and I will try to answer any questions.

With flang compiled from llvm/main, and comparing to two different versions of gfortran (9.4.0 and 13.0 (top of tree) as of a few days ago), the total runtime in 549.fotonik3d_r is around 2.6-3 times slower for Flang than Gfortran, on same hardware platform and using -O3 optimization for both (-Ofast lost a bit of time for the gfortran 13.0, I didn't try -Ofast with the gfortran 9.4 version).

llvm-flang vs gfortran 9.4:

Function	%flang	%gfortran	Relative
`material::mat_update`	28.08%	26.16%	2.85
`UPML::upml_updatee_simple`	24.43%	20.64%	3.07
`UPML::upml_updateh`	22.36%	16.21%	3.67
`update::update_h`	12.14%	15.28%	2.11
`power::power_dft`	11.68%	19.1%	1.62

The relative number is calculated based on percentage of total runtime, and llvm-flang time divided by gfortran - so say gfortran ran in 200s, then 26.16% of that would be 52.3s, and the time in the llvm-flang is about 149s (2.85 x 52.3s). These are NOT the actual numbers, just explaining the calculation. Percentage figures are as per perf record/perf report.

There are differences in use of fma type operations and vector operations (Vector operations may get resolved with alias analysis, the former would probably happen with an -Ofast flag available).

Hacking the compiler to use fma aggressively, it gives about 3% overall improvement, so from 2.65 to 2.6 times for the 9.4 - I didn't make this precise comparison with 13.0.

Hacking the ScopedNoAliasAA to always say "no alias", while also using the the "force use of fma", and the 9.4 comparison comes down to 1.23x slower, and 1.37x slower against 13.0.

Function	%flang	%gfortran	Relative
material::mat_update	34.14%	26.16%	1.6
UPML::upml_updatee_simple	17.48%	20.64%	43.3
UPML::upml_updateh	15.91%	16.21%	33.2
update::update_h	15.46%	15.28%	1.25
power::power_dft	14.20%	19.1%	0.91

Same table for 13.0 comparison:

Function	%flang	%gfortran	Relative
`material::mat_update`	34.14%	23.51%	2.0
`UPML::upml_updatee_simple`	17.48%	22.14%	1.06
`UPML::upml_updateh`	15.91%	16.85%	1.30
`update::update_h`	15.46%	15.63%	1.36
`power::power_dft`	14.20%	19.1%	1.03

(Note that UPML::upml_updatee in 9.4.0 is inlining UPML::upml_updatee_single, so if you were doing this experiment yourself, you may not see the same names in gfortran as listed above - the inlining strategy for the latest version seems to have changed, so it's no longer inlining the rather large function).

vzakhari · 2022-10-19T21:02:38Z

I approached this a different way on an x86 CPU, but the conclusion is the same.

I took the innermost loop in the loopnest at line 2228 in UPML::upml_updatee_simple. Initial Flang time is 54.36s for the whole loop nest. I manually added alias.scope/noalias metadata, fast attributes and also nsw/nuw flags for the offsets and loop iv computations (see LLVM IR diff in the attached files). Execution time of the loop nest decreased with each modification:

Alias metadata - enables innermost loop vectorization: 27.825s
nsw/nuw - gets rid of redundant integer computations inside the innermost loop: 26.06s
fast - FMAs: 24.1

There are multiple loop nests like this one in UPML::upml_updatee_simple, so they should all benefit from the same LLVM optimizations, when proper metadata/attributes are provided. UPML::upml_updateh consists of loop nests of the same characteristics.

LLVMIRdiff.zip

kiranchandramohan · 2023-06-01T10:18:38Z

We need either Full TBAA like classic flang or the full restrict patches to get the benefits in fotonik.

tblah · 2023-09-29T09:10:16Z

I'm working on classic flang like TBAA here https://discourse.llvm.org/t/rfc-propagate-fir-alias-analysis-information-using-tbaa/73755

tblah · 2023-11-28T15:31:51Z

TBAA tags are enabled by default, but fotonik is still a bit slower than classic flang so I will leave this ticket open for now.

#73111

vzakhari · 2023-11-28T21:11:14Z

Hi Tom,

I see performance degradations on 549.fotonik3d (-33%), 437.leslie3d (-168%) and 459.GemsFDTD (-27%) on x86, and 437.leslie3d (-217%) and 459.GemsFDTD (-74%) on aarch64.

I tried 437 with -mllvm -disable-fir-alias-tags on icelake, and performance restored. I am using flang-new -Ofast -march=native.

Are these regressions expected?

tblah · 2023-11-29T10:23:52Z

Hi Slava,

Thanks for letting me know. No these were not expected. I hadn't tried on x86, but the aarch64 regressions are a big surprise. I will look into this immediately.

Would you like me to revert enabling alias tags by default in the meantime?

tblah · 2023-11-29T14:51:00Z

Just to confirm, I have reproduced the aarch64 issue and am looking into it. After that me or Mats will look at X86.

The aarch64 regression is not present when LTO is used. GemsFDTD is 28% faster with the TBAA tags when LTO is enabled.

vzakhari · 2023-11-29T16:54:40Z

Thank you for looking into this, Tom! If the investigation/resolution takes more than a couple of days, I would prefer reverting it.

@vzakhari

This reverts commit caba031. Serious performance regressions were reported by @vzakhari llvm#58303 (comment) Fixing this doesn't look quick so I will revert for now.

@vzakhari

This reverts commit caba031. Serious performance regressions were reported by @vzakhari #58303 (comment) Fixing this doesn't look quick so I will revert for now.

tblah · 2023-11-29T17:10:53Z

It is reverted #73821

vzakhari · 2023-11-29T18:10:41Z

Thank you, Tom! It turns out the 437.leslie3d regression is the only real one on my side (both on aarch64 and x86). GemsFDTD and fotonik3d actually got nice improvements from this change (both on aarch64 and x86). capacita_11 slowed down by 8% on x86, but I would not bother about it at this point.

github-actions bot added the new issue label Oct 11, 2022

kiranchandramohan added flang Flang issues not falling into any other category and removed new issue labels Oct 11, 2022

kiranchandramohan assigned Leporacanthicus Oct 11, 2022

EugeneZelenko added the performance label Oct 19, 2022

kiranchandramohan unassigned Leporacanthicus Jun 1, 2023

kiranchandramohan changed the title ~~[Flang] Investigate performance issue in 549.fotonik3d_r~~ [Flang] Fix performance issue in 549.fotonik3d_r Jun 1, 2023

kiranchandramohan assigned tblah Sep 28, 2023

tblah mentioned this issue Nov 29, 2023

Revert "[flang] Enable alias tags pass by default (#73111)" #73821

Merged

github-project-automation bot added this to Flang Spec2017 issues Aug 28, 2024

github-project-automation bot moved this to In Progress in Flang Spec2017 issues Aug 28, 2024

github-project-automation bot moved this to In Progress in Flang Spec2017 issues Nov 18, 2024

github-project-automation bot added this to Flang Spec2017 issues Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flang] Fix performance issue in 549.fotonik3d_r #58303

[Flang] Fix performance issue in 549.fotonik3d_r #58303

kiranchandramohan commented Oct 11, 2022

Leporacanthicus commented Oct 19, 2022

Uh oh!

vzakhari commented Oct 19, 2022

Uh oh!

kiranchandramohan commented Jun 1, 2023

Uh oh!

tblah commented Sep 29, 2023

Uh oh!

tblah commented Nov 28, 2023

Uh oh!

vzakhari commented Nov 28, 2023

Uh oh!

tblah commented Nov 29, 2023

Uh oh!

tblah commented Nov 29, 2023

Uh oh!

vzakhari commented Nov 29, 2023

Uh oh!

tblah commented Nov 29, 2023

Uh oh!

vzakhari commented Nov 29, 2023

Uh oh!

[Flang] Fix performance issue in 549.fotonik3d_r #58303

[Flang] Fix performance issue in 549.fotonik3d_r #58303

Comments

kiranchandramohan commented Oct 11, 2022

Leporacanthicus commented Oct 19, 2022

Uh oh!

vzakhari commented Oct 19, 2022

Uh oh!

kiranchandramohan commented Jun 1, 2023

Uh oh!

tblah commented Sep 29, 2023

Uh oh!

tblah commented Nov 28, 2023

Uh oh!

vzakhari commented Nov 28, 2023

Uh oh!

tblah commented Nov 29, 2023

Uh oh!

tblah commented Nov 29, 2023

Uh oh!

vzakhari commented Nov 29, 2023

Uh oh!

tblah commented Nov 29, 2023

Uh oh!

vzakhari commented Nov 29, 2023

Uh oh!