Skip to content

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 14, 2025

Lazy loops can be made automatically atomic in some situations by the optimizer, in which case their handling significantly simplifies, because a lazy atomic loop just becomes a repeater for the min iteration count. Most viable lazy loops are caught by the optimizer, but some aren't yet are determined to be treatable as atomic at emit time. EmitLazy was handling such cases incorrectly, resulting in a missing branch target and compilation failing. This fixes that two fold:

  1. The optimizer is improved, so the discovered tests cases that were triggering this case no longer do.
  2. The distinction is eliminated from EmitLazy, as the case is rare and it would take a lot more code to optimize for that case.

Best reviewed without whitespace (indentation changed on a few large sections).

Fixes #117601

…e generator

Lazy loops can be made automatically atomic in some situations by the optimizer, in which case their handling significantly simplifies, because a lazy atomic loop just becomes a repeater for the min iteration count. Most viable lazy loops are caught by the optimizer, but some aren't yet are determined to be treatable as atomic at emit time. EmitLazy was handling such cases incorrectly, resulting in a missing branch target and compilation failing. This fixes that two fold:
1. The optimizer is improved, so the discovered tests cases that were triggering this case no longer do.
2. The distinction is eliminated from EmitLazy, as the case is rare and it would take a lot more code to optimize for that case.
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Fixes handling of late-discovered atomic lazy loops by improving the optimizer and removing special‐case logic in lazy emission.

  • Adds new functional tests for various lookbehind and lazy‐loop scenarios.
  • Refactors RegexNode optimizations to consistently use rootNode.Options and apply RTL checks per case.
  • Simplifies EmitLazy in both compiler and generated emitter by removing the isAtomic branch.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
Regex.Match.Tests.cs Added tests covering edge cases in lookbehinds and lazy loops.
RegexNode.cs Updated FinalOptimize and EliminateEndingBacktracking guards and RTL logic.
RegexCompiler.cs Removed isAtomic checks and assertion in EmitLazy.
RegexGenerator.Emitter.cs Mirrored compiler changes in generated emitter, removed isAtomic branch.
Comments suppressed due to low confidence (2)

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs:414

  • By only checking for NonBacktracking here, RTL loops will still enter backtracking elimination and may receive Oneloop/Setloop atomic optimizations that haven’t been vetted for RTL. Consider restoring the RTL guard or adding per-case RTL checks to prevent incorrect behavior in right-to-left mode.
                (Options & RegexOptions.NonBacktracking) != 0)

@stephentoub
Copy link
Member Author

/ba-g browser wasm timeouts

@stephentoub stephentoub merged commit 1b243b7 into dotnet:main Jul 24, 2025
81 of 87 checks passed
@stephentoub stephentoub deleted the fixlazyatomic branch July 24, 2025 02:10
@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compiled regex patterns like "(?<=()*?)a" thow exception Bad label content in ILGenerator.
2 participants