Skip to content

[LSR][slow compilation] LSR 10000x slower for some X86-64 architectures than others #144858

Open
@jeanPerier

Description

@jeanPerier

When compiling a Fortran test with flang (source available here), it was noticed that it is more than 1000 times slower to compile when targeting most X86-64 (icelake-serever, or znver2 , znver3, emeraldrapids, skylake), than when targeting znver4 or znver5.

The slow compilations are very slow (more than a 100s) and 99.9% of the time is spent in the "Loop Strength Reduction" reduction pass according to -ftime-report.

It looks like a lot of time is spent in CompareSCEVComplexity under llvm::ScalarEvolution::getAddExpr called from LSRInstance::GenerateReassociationsImpl.

Attached are:

  • clang_repro.ll.txt which contains the IR from produced by flang and one can see the compilation time difference of it with clang -O3 -march=znver4 vs clang -O3 -march=icelake-serever.
  • lsr_repro.ll.txt that is the IR taken from a slow compilation right before LSR. One can reproduce the slow LSR step with it with opt -loop-reduce.

Note that lsr_repro.ll does not contain CPU specific attributes, so it seems the architecture does not impact LSR speed directly, but rather the IR that reaches it due to previous optimization and LSR chokes on it while it looks reasonable (a bit more than 1000 IR ops). There is likely a quadratic behavior somewhere here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions