-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Flang] Fix performance issue in 549.fotonik3d_r #58303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So the single sentence answer to this is "we're lacking alias analysis". Here's a few more words. If someone really wants to have further details, please feel free to ask, and I will try to answer any questions. With flang compiled from llvm/main, and comparing to two different versions of gfortran (9.4.0 and 13.0 (top of tree) as of a few days ago), the total runtime in 549.fotonik3d_r is around 2.6-3 times slower for Flang than Gfortran, on same hardware platform and using -O3 optimization for both (-Ofast lost a bit of time for the gfortran 13.0, I didn't try -Ofast with the gfortran 9.4 version). llvm-flang vs gfortran 9.4:
The relative number is calculated based on percentage of total runtime, and llvm-flang time divided by gfortran - so say gfortran ran in 200s, then 26.16% of that would be 52.3s, and the time in the llvm-flang is about 149s (2.85 x 52.3s). These are NOT the actual numbers, just explaining the calculation. Percentage figures are as per There are differences in use of fma type operations and vector operations (Vector operations may get resolved with alias analysis, the former would probably happen with an -Ofast flag available). Hacking the compiler to use fma aggressively, it gives about 3% overall improvement, so from 2.65 to 2.6 times for the 9.4 - I didn't make this precise comparison with 13.0. Hacking the ScopedNoAliasAA to always say "no alias", while also using the the "force use of fma", and the 9.4 comparison comes down to 1.23x slower, and 1.37x slower against 13.0.
Same table for 13.0 comparison:
(Note that |
I approached this a different way on an x86 CPU, but the conclusion is the same. I took the innermost loop in the loopnest at line 2228 in
There are multiple loop nests like this one in |
We need either Full TBAA like classic flang or the full restrict patches to get the benefits in fotonik. |
I'm working on classic flang like TBAA here https://discourse.llvm.org/t/rfc-propagate-fir-alias-analysis-information-using-tbaa/73755 |
TBAA tags are enabled by default, but fotonik is still a bit slower than classic flang so I will leave this ticket open for now. |
Hi Tom, I see performance degradations on 549.fotonik3d (-33%), 437.leslie3d (-168%) and 459.GemsFDTD (-27%) on x86, and 437.leslie3d (-217%) and 459.GemsFDTD (-74%) on aarch64. I tried 437 with Are these regressions expected? |
Hi Slava, Thanks for letting me know. No these were not expected. I hadn't tried on x86, but the aarch64 regressions are a big surprise. I will look into this immediately. Would you like me to revert enabling alias tags by default in the meantime? |
Just to confirm, I have reproduced the aarch64 issue and am looking into it. After that me or Mats will look at X86. The aarch64 regression is not present when LTO is used. GemsFDTD is 28% faster with the TBAA tags when LTO is enabled. |
Thank you for looking into this, Tom! If the investigation/resolution takes more than a couple of days, I would prefer reverting it. |
This reverts commit caba031. Serious performance regressions were reported by @vzakhari llvm#58303 (comment) Fixing this doesn't look quick so I will revert for now.
This reverts commit caba031. Serious performance regressions were reported by @vzakhari #58303 (comment) Fixing this doesn't look quick so I will revert for now.
It is reverted #73821 |
Thank you, Tom! It turns out the 437.leslie3d regression is the only real one on my side (both on aarch64 and x86). GemsFDTD and fotonik3d actually got nice improvements from this change (both on aarch64 and x86). capacita_11 slowed down by 8% on x86, but I would not bother about it at this point. |
fotonik
benchmark performs poorly with Flang. The performance is almost three times slower when compared to other compilers. Investigate the performance issue.The text was updated successfully, but these errors were encountered: