-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Milestone
Description
e.g.
using System;
using System.Globalization;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
var r = new Regex(@"[A-Z]", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("\u0131")); // should print true, but prints false
}
}
In Turkish, I
lowercases to ı
(\u0131
), so the above repro should print out true. But whereas Regex
is using the target culture when dealing with individual characters in a set:
Lines 551 to 556 in fd82afe
SingleRange range = rangeList[i]; | |
if (range.First == range.Last) | |
{ | |
char lower = culture.TextInfo.ToLower(range.First); | |
rangeList[i] = new SingleRange(lower, lower); | |
} |
when it instead has a range with multiple characters, it delegates to this AddLowercaseRange function:
Line 569 in fd82afe
private void AddLowercaseRange(char chMin, char chMax) |
which doesn't factor in the target culture into its decision, instead using a precomputed table:
Line 301 in fd82afe
private static readonly LowerCaseMapping[] s_lcTable = new LowerCaseMapping[] |
@tarekgh, @GrabYourPitchforks, am I correct that such a table couldn't possibly be right, given that different cultures case differently?
Note that if the above repro is instead changed to spell out the whole range of uppercase letters:
using System;
using System.Globalization;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
var r = new Regex(@"[ABCDEFGHIJKLMNOPQRSTUVWXYZ]", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("\u0131")); // prints true
}
}
it then correctly prints true
.