Description
When compiling a Fortran test with flang (source available here), it was noticed that it is more than 1000 times slower to compile when targeting most X86-64 (icelake-serever
, or znver2
, znver3
, emeraldrapids
, skylake
), than when targeting znver4
or znver5
.
The slow compilations are very slow (more than a 100s) and 99.9% of the time is spent in the "Loop Strength Reduction" reduction pass according to -ftime-report
.
It looks like a lot of time is spent in CompareSCEVComplexity
under llvm::ScalarEvolution::getAddExpr
called from LSRInstance::GenerateReassociationsImpl
.
Attached are:
- clang_repro.ll.txt which contains the IR from produced by flang and one can see the compilation time difference of it with
clang -O3 -march=znver4
vsclang -O3 -march=icelake-serever
. - lsr_repro.ll.txt that is the IR taken from a slow compilation right before LSR. One can reproduce the slow LSR step with it with
opt -loop-reduce
.
Note that lsr_repro.ll
does not contain CPU specific attributes, so it seems the architecture does not impact LSR speed directly, but rather the IR that reaches it due to previous optimization and LSR chokes on it while it looks reasonable (a bit more than 1000 IR ops). There is likely a quadratic behavior somewhere here.