-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
With #67184 we changed the logic of how case-insensitive comparisons are made between the pattern and the input. Before that change, we used to call ToLower() on the pattern and ToLower() on the input and see if they matched, but this design had several problems (see #61048 for more details). After that change, we are now using a Casing equivalence table built into the assembly which tells us given a character c
which characters should be considered equivalent to c
. Then, using that table we can transform patterns that look like ABC
into [Aa][Bb][Cc]
.
That said, we still have one scenario in which we are not yet using the case equivalence table to perform the case insensitive comparisons, which is when the pattern has backreferences. An example of this, would be if you have a pattern like (A)\1
, we would now be transforming it to ([Aa])\1
. If you use that pattern to try to match something like aA
, then you need to be able to perform the case insensitive comparison between the next character in the input, and what previously matched at position 1. This is not using the case equivalence table today, mainly because doing so would mean that we would need to either expose it publicly so that the source generator engine could consume it (given source generator engine consumes it from an external assembly) or to not support IgnoreCase Backreference patterns all together with the source generator engine.
For now, we have opted to make backreference case-insensitive comparisons to fall back to use TextInfo(a).ToLower() == TextInfo(b).ToLower()
which should have the same behavior than our Regex case equivalence table in the vast majority of cases. However, we did find some cases where there could be inconsistencies with our Regex case equivalence table, for example, if the application is running on Windows 7 or Windows 8.1, or if the application is using NLS Globalization. The thing that all of those three cases have in common, is that they all use NLS for the case conversions, and we have found that in the InvariantCulture there are around 11 cases where the case conversions don't match the ones that would be created if running with ICU Globalization (and hence, they don't match the Regex casing table).
This issue is in order to track the work of re-evaluating the decision of not using the Regex casing table when matching backreferences, and to start the discussion to see what we should do for .NET 7 RTM.