Skip to content

llvm-20: Nightly compile time regression building comrak with release profile #137909

@parasyte

Description

@parasyte

Building comrak with cargo build --release seems to never finish on x86_64 (both on Windows and Linux). It normally builds on the stable channel on my machine in approximately 26 seconds.

Bisect results:

searched nightlies: from nightly-2025-02-15 to nightly-2025-02-25
regressed nightly: nightly-2025-02-18
searched commit range: 5bc6231...ce36a96
regressed commit: ce36a96

bisected with cargo-bisect-rustc v0.6.9

Host triple: x86_64-pc-windows-msvc
Reproduce with:

cargo bisect-rustc --start=2025-02-15 --end=2025-02-25 --script "C:\\Program Files\\Git\\usr\\bin\\bash.exe" -- ./regress.sh

regress.sh:

#!/bin/bash

set -eux -o pipefail

timeout 60 cargo build --release

@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged

Activity

added
C-bugCategory: This is a bug.
regression-untriagedUntriaged performance or correctness regression.
on Mar 3, 2025
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
and removed
regression-untriagedUntriaged performance or correctness regression.
on Mar 3, 2025
added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
on Mar 3, 2025
added
I-compiletimeIssue: Problems and improvements with respect to compile times.
on Mar 3, 2025
saethlin

saethlin commented on Mar 3, 2025

@saethlin
Member

perf top during a compile of the above looks like this:

Overhead  Shared Object                            Symbol
  44.24%  libLLVM.so.20.1-rust-1.87.0-nightly      [.] programUndefinedIfUndefOrPoison(llvm::Value const*, bool) [clone .llvm.5409155288196509361]
  13.71%  libLLVM.so.20.1-rust-1.87.0-nightly      [.] isGuaranteedNotToBeUndefOrPoison(llvm::Value const*, llvm::AssumptionCache*, llvm::Instruction const*, llvm::DominatorTree const*, unsigned int, UndefPoisonKind) [clone .llvm.5409155288196509361]
   0.78%  libLLVM.so.20.1-rust-1.87.0-nightly      [.] canCreateUndefOrPoison(llvm::Operator const*, UndefPoisonKind, bool) [clone .llvm.5409155288196509361]
   0.62%  libLLVM.so.20.1-rust-1.87.0-nightly      [.] llvm::Operator::hasPoisonGeneratingAnnotations() const
   0.45%  libLLVM.so.20.1-rust-1.87.0-nightly      [.] llvm::propagatesPoison(llvm::Use const&)
   0.20%  libLLVM.so.20.1-rust-1.87.0-nightly      [.] llvm::getKnowledgeForValue(llvm::Value const*, llvm::ArrayRef<llvm::Attribute::AttrKind>, llvm::AssumptionCache*, llvm::function_ref<bool (llvm::RetainedKnowledge, llvm::Instruction*, llvm::CallBase::B

This is not a hang, the build completes after 20 minutes (phew).

removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Mar 3, 2025
moxian

moxian commented on Mar 3, 2025

@moxian
Contributor

This can be somewhat "minimized" to just scanners::autolink_email function, which has no external dependencies and can be tested in isolation (when decorated with #[no_mangle] or similar)

It's a huge match statement, and I would guess codegen takes polynomial time somehow?
Limiting the match the first 70 cases (0-69) makes the function compile in 85 seconds on my machine (~instant on stable);
Limiting it to the first 60 cases (0-59) drops the time down to 40 seconds
Limiting to first 50 (0-49) drops it further to 16sec

Having all 128 match arms, but deleting either arm 2 ( 2=>{return None;} ) or arm 11 (11=>{return Some(cursor);}) makes it compile near-instantly. Which is probably not surprising since those are the only two return paths.


Minimized further:

#[no_mangle]
pub fn autolink_email(s: &[u8]) -> Option<usize> {
    let mut cursor = 0;
    let mut marker = 0;
    let len = s.len();

    let mut yych: u8 = 0;
    let mut yystate: usize = 0;
    'yyl: loop {
        match yystate {
            0 => {
                return None;
            }
            1 => {
                // changing this to `return cursor`, the above to `return 1234` and the return type to `->usize`
                // makes the code compile fast again
                return Some(cursor);
            }
            2 => match yych {
                _ => {
                    yystate = 6;
                    continue 'yyl;
                }
            },
            3 => {
                marker = cursor;
                continue 'yyl;
            }
            4 => {
                cursor = marker;
                yystate = 2;
                continue 'yyl;
            }


            // add some extra copies of the block if your machine is too fast to notice the slowdown
            10 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            11 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            12 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            13 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            14 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            15 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            16 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            17 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            18 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            19 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            20 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            21 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            22 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            23 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            24 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            25 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            26 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            27 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            28 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            29 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            30 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            31 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            32 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            33 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            34 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            35 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            36 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            37 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            38 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }
            39 => {
                yych = unsafe { if cursor < len { *s.get_unchecked(cursor) } else { 0 } };
                continue 'yyl;
            }


            // regular `panic!()` also works, but a `return` doesn't
            _ => unsafe{ core::hint::unreachable_unchecked() },
        }
    }
}

fn main() {}

llvm passes timings:

> rustc +nightly -Copt-level=0 src/main.rs --emit=mir,llvm-ir -C strip=debuginfo  -C extra-filename=-slow-0
> clang.exe .\main-slow-0.ll -O3 -ftime-report
warning: overriding the module target triple with x86_64-pc-windows-msvc19.42.34435 [-Woverride-module]
===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 14.3594 seconds (14.3573 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   7.2344 ( 50.4%)   7.2344 ( 50.4%)   7.2246 ( 50.3%)  EarlyCSEPass
   7.1250 ( 49.6%)   7.1250 ( 49.6%)   7.1274 ( 49.6%)  SROAPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  SimplifyCFGPass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)  IPSCCPPass
<.. nothing of interest in the rest of the output ..>
apiraino

apiraino commented on Mar 3, 2025

@apiraino
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

added
P-highHigh priority
and removed
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Mar 3, 2025
self-assigned this
on Mar 3, 2025

8 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.I-compiletimeIssue: Problems and improvements with respect to compile times.P-highHigh priorityllvm-fixed-upstreamIssue expected to be fixed by the next major LLVM upgrade, or backported fixesregression-from-stable-to-nightlyPerformance or correctness regression from stable to nightly.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @parasyte@apiraino@moxian@dianqk@saethlin

    Issue actions

      llvm-20: Nightly compile time regression building `comrak` with release profile · Issue #137909 · rust-lang/rust