Skip to content

Bugfix/parse mapped abbrev #265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 30, 2017
Merged

Bugfix/parse mapped abbrev #265

merged 16 commits into from
Jul 30, 2017

Conversation

bplubell
Copy link
Contributor

@bplubell bplubell commented Jul 9, 2017

So the direction I went was to change the regex to in the UnitParser.ParseUnit to match one of any of the abbreviations for the unit type. This seems more robust as it uses UnitSystem abbreviations and allows for more flexible abbreviations (either mapped at run time or generated).

Related issue: #264

@bplubell
Copy link
Contributor Author

I want to add a few more tests, so this should not be merged quite yet.
I did not realize that the solution to this was basically the same as the solution to #254 and #253 -- I am going to add tests to verify those issues are resolved as well.

@angularsen
Copy link
Owner

Been a bit busy, will get to this soon.

Copy link
Owner

@angularsen angularsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I like the approach. A couple of comments.

public void ParseLengthUnitAbbreviationWithNumbers(string s)
{
UnitSystem.ClearCache();
UnitSystem unitSystem = UnitSystem.GetCached("en-US");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized a better approach than using UnitSystem.GetCached() is to add a new ctor overload that allows you to disable default unit abbreviations from being loaded. I just added this in bf11a78. GetCached() is not ideal, because it is

  1. Statically cached (problem if running tests in parallel) and it
  2. Automatically loads all the units defined in .json

If Length.json already contained these units you are defining, you don't know for sure if the test passes due to calling MapUnitToAbbreviation or not.

So I would change this snippet to:

var unitSystem = new UnitSystem("en-US", loadDefaultAbbreviations: false);
unitSystem.MapUnitToAbbreviation(UnitsNet.Units.LengthUnit.Meter, "m2");

[InlineData("_")]
[InlineData("?")]
[InlineData("123")]
[InlineData("od to")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this test case confusing. Does od to have a particular meaning or are you simply testing whitespace in the abbreviation?
Also, the method name implies testing with numbers but the test cases seem to test that various special characters in the abbreviation work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed you are right. od to was simply to test a space and the name was confusing. I will change to "SpecialCharacters".

.OrderByDescending(s => s.Length) // Important to order by length -- if "m" is before "mm" and the input is "mm", it will match just "m" and throw invalid string error
.ToArray();
// Escape special regex characters
for (int i = 0; i < unitAbbreviations.Length; i++)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a much better approach. By the looks of it, I will still need a for loop though? If the project is ever updated to .NET Standard 2.0, the for loop can be replaced by:
string unitsRegex = $"({String.Join("|", unitAbbreviations.ForEach(s => Regex.Replace(s)))})";

string regexString = string.Format(@"(?:\s*(?<value>[-+]?{0}{1}{2}{3})?{4}{5}",
numRegex, // capture base (integral) Quantity value
exponentialRegex, // capture exponential (if any), end of Quantity capturing
@"\s?", // ignore whitespace (allows both "1kg", "1 kg")
@"(?<unit>[^\s\d,]+)", // capture Unit (non-whitespace) input
$@"(?<unit>{unitsRegex})", // capture Unit (non-whitespace) input
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @ is no longer needed. Update comment to something like // capture Unit by a list of abbreviations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop goes away with Regex.Escape. Will also update comment.


if (_unitTypeToUnitValueToAbbrevs.TryGetValue(unitType, out unitValueToAbbrevs))
{
foreach (var item in unitValueToAbbrevs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify this to return unitValueToAbbrevs.Values.ToArray();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is giving me a compile time error trying to do this. I think it is because unitValueToAbbrevs.Values is a collection of lists. Is there another way with LINQ that I have not found?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Try unitValueToAbbrevs.Values.SelectMany(x => x).ToArray()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like it. A good snippet to remember.

/// <param name="unitValue">Enum value for unit.</param>
/// <returns>Unit abbreviations associated with unit.</returns>
[PublicAPI]
public string[] GetAllAbbreviations(Type unitType)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, this method has been requested before.

@@ -37,7 +37,8 @@ internal static class UnitParser
internal static TUnit ParseUnit<TUnit>([NotNull] string str,
[CanBeNull] IFormatProvider formatProvider,
[NotNull] ParseUnit<TUnit> parseUnit,
[NotNull] Func<TUnit, TUnit, TUnit> add)
[NotNull] Func<TUnit, TUnit, TUnit> add,
Type unitType)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding another generic parameter instead? And renaming TUnit to TQuantity, as per recent discussion on naming conventions in #260.
Suggestion: ParseUnit<TQuantity, TUnit>()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I follow... see if the next commits are what you are expecting. Another idea is to use an interface for quantity, then the TUnit argument can be used to determine the type with no need for a redundant parameter.

@angularsen
Copy link
Owner

Let me know when it's ready for another review

@bplubell
Copy link
Contributor Author

It should be ready to review again.

@bplubell
Copy link
Contributor Author

@anjdreas This is ready for review.

@angularsen
Copy link
Owner

Hi, I'm on vacation trip until July 30. Will review when I get back.

@angularsen
Copy link
Owner

I meant July 25

@bplubell
Copy link
Contributor Author

No worries; thanks for the update. Have a great trip!

Copy link
Owner

@angularsen angularsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I did some fixes by committing directly onto your PR.

// Escape special regex characters
for (int i = 0; i < unitAbbreviations.Length; i++)
{
unitAbbreviations[i] = Regex.Escape(unitAbbreviations[i]);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for-loop can be removed in favor of

....
.Select(Regex.Escape)
.ToArray();

above.


internal static class UnitParser
{
[SuppressMessage("ReSharper", "UseStringInterpolation")]
internal static TUnit ParseUnit<TUnit>([NotNull] string str,
internal static TQuantity ParseUnit<TQuantityEnum, TQuantity>([NotNull] string str,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the type parameters to match a naming convention.
Instead of TQuantity, TUnit it is now TQuantityEnum, TQuantity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants