-
Notifications
You must be signed in to change notification settings - Fork 393
Bugfix/parse mapped abbrev #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/parse mapped abbrev #265
Conversation
Match exactly one of the set of allowable units
Been a bit busy, will get to this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I like the approach. A couple of comments.
public void ParseLengthUnitAbbreviationWithNumbers(string s) | ||
{ | ||
UnitSystem.ClearCache(); | ||
UnitSystem unitSystem = UnitSystem.GetCached("en-US"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized a better approach than using UnitSystem.GetCached()
is to add a new ctor overload that allows you to disable default unit abbreviations from being loaded. I just added this in bf11a78. GetCached()
is not ideal, because it is
- Statically cached (problem if running tests in parallel) and it
- Automatically loads all the units defined in
.json
If Length.json
already contained these units you are defining, you don't know for sure if the test passes due to calling MapUnitToAbbreviation
or not.
So I would change this snippet to:
var unitSystem = new UnitSystem("en-US", loadDefaultAbbreviations: false);
unitSystem.MapUnitToAbbreviation(UnitsNet.Units.LengthUnit.Meter, "m2");
[InlineData("_")] | ||
[InlineData("?")] | ||
[InlineData("123")] | ||
[InlineData("od to")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this test case confusing. Does od to
have a particular meaning or are you simply testing whitespace in the abbreviation?
Also, the method name implies testing with numbers but the test cases seem to test that various special characters in the abbreviation work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed you are right. od to
was simply to test a space and the name was confusing. I will change to "SpecialCharacters".
UnitsNet/CustomCode/UnitParser.cs
Outdated
.OrderByDescending(s => s.Length) // Important to order by length -- if "m" is before "mm" and the input is "mm", it will match just "m" and throw invalid string error | ||
.ToArray(); | ||
// Escape special regex characters | ||
for (int i = 0; i < unitAbbreviations.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Regex.Escape()
instead:
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape.aspx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like a much better approach. By the looks of it, I will still need a for loop though? If the project is ever updated to .NET Standard 2.0, the for loop can be replaced by:
string unitsRegex = $"({String.Join("|", unitAbbreviations.ForEach(s => Regex.Replace(s)))})";
UnitsNet/CustomCode/UnitParser.cs
Outdated
string regexString = string.Format(@"(?:\s*(?<value>[-+]?{0}{1}{2}{3})?{4}{5}", | ||
numRegex, // capture base (integral) Quantity value | ||
exponentialRegex, // capture exponential (if any), end of Quantity capturing | ||
@"\s?", // ignore whitespace (allows both "1kg", "1 kg") | ||
@"(?<unit>[^\s\d,]+)", // capture Unit (non-whitespace) input | ||
$@"(?<unit>{unitsRegex})", // capture Unit (non-whitespace) input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @
is no longer needed. Update comment to something like // capture Unit by a list of abbreviations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop goes away with Regex.Escape. Will also update comment.
UnitsNet/CustomCode/UnitSystem.cs
Outdated
|
||
if (_unitTypeToUnitValueToAbbrevs.TryGetValue(unitType, out unitValueToAbbrevs)) | ||
{ | ||
foreach (var item in unitValueToAbbrevs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simplify this to return unitValueToAbbrevs.Values.ToArray();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is giving me a compile time error trying to do this. I think it is because unitValueToAbbrevs.Values
is a collection of lists. Is there another way with LINQ that I have not found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Try unitValueToAbbrevs.Values.SelectMany(x => x).ToArray()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like it. A good snippet to remember.
/// <param name="unitValue">Enum value for unit.</param> | ||
/// <returns>Unit abbreviations associated with unit.</returns> | ||
[PublicAPI] | ||
public string[] GetAllAbbreviations(Type unitType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, this method has been requested before.
UnitsNet/CustomCode/UnitParser.cs
Outdated
@@ -37,7 +37,8 @@ internal static class UnitParser | |||
internal static TUnit ParseUnit<TUnit>([NotNull] string str, | |||
[CanBeNull] IFormatProvider formatProvider, | |||
[NotNull] ParseUnit<TUnit> parseUnit, | |||
[NotNull] Func<TUnit, TUnit, TUnit> add) | |||
[NotNull] Func<TUnit, TUnit, TUnit> add, | |||
Type unitType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding another generic parameter instead? And renaming TUnit
to TQuantity
, as per recent discussion on naming conventions in #260.
Suggestion: ParseUnit<TQuantity, TUnit>()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I follow... see if the next commits are what you are expecting. Another idea is to use an interface for quantity, then the TUnit
argument can be used to determine the type with no need for a redundant parameter.
Let me know when it's ready for another review |
It should be ready to review again. |
@anjdreas This is ready for review. |
Hi, I'm on vacation trip until July 30. Will review when I get back. |
I meant July 25 |
No worries; thanks for the update. Have a great trip! |
TQuantity refers to the unit struct type.
Following naming convention #260
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I did some fixes by committing directly onto your PR.
UnitsNet/CustomCode/UnitParser.cs
Outdated
// Escape special regex characters | ||
for (int i = 0; i < unitAbbreviations.Length; i++) | ||
{ | ||
unitAbbreviations[i] = Regex.Escape(unitAbbreviations[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for-loop can be removed in favor of
....
.Select(Regex.Escape)
.ToArray();
above.
|
||
internal static class UnitParser | ||
{ | ||
[SuppressMessage("ReSharper", "UseStringInterpolation")] | ||
internal static TUnit ParseUnit<TUnit>([NotNull] string str, | ||
internal static TQuantity ParseUnit<TQuantityEnum, TQuantity>([NotNull] string str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the type parameters to match a naming convention.
Instead of TQuantity, TUnit
it is now TQuantityEnum, TQuantity
.
So the direction I went was to change the regex to in the UnitParser.ParseUnit to match one of any of the abbreviations for the unit type. This seems more robust as it uses UnitSystem abbreviations and allows for more flexible abbreviations (either mapped at run time or generated).
Related issue: #264