Skip to content

Conversation

abelbraaksma
Copy link
Contributor

@abelbraaksma abelbraaksma commented Aug 16, 2020

Fixes #38872.

This PR does the following things (as agreed on in #38872 (comment) by @terrajobst):

  • Makes RegexParseException public
  • Adds a private overload for serialization that throws a NotImplementedException (it is serialized as an ArgumentException).
  • Makes enum type RegexParseError public
  • Renames all but a few of the enum fields to more meaningful names
  • Adds Unknown as the zero-field
  • Cleans up the test code (which used reflection to get to this data)
  • Cleans up RegexParseException a bit w.r.t. using automatic properties

Todo:

  • Inline documentation (I need a little help as to what is expected here) done
  • add tests for offset done

Locally tested against test set, but full build with all other tests not yet. However, I did not find dependencies elsewhere, apart from a binary serialization test on the exception.

Tagging: @danmosemsft.

@ghost
Copy link

ghost commented Aug 16, 2020

Tagging subscribers to this area: @eerhardt, @pgovind
See info in area-owners.md if you want to be subscribed.

@abelbraaksma
Copy link
Contributor Author

TypesMustExist : Type 'System.Text.RegularExpressions.RegexParseError' does not exist in the reference but it does exist in the implementation.

Looks like I need to inform the system that there's a new kid in town ;)

@am11
Copy link
Member

am11 commented Aug 16, 2020

Looks like I need to inform the system that there's a new kid in town ;)

That would be the ref file.

@abelbraaksma
Copy link
Contributor Author

@am11 thanks! That's bascially it then?

@am11
Copy link
Member

am11 commented Aug 16, 2020

I think usually that is enough. There might be some target which will complain about compatibility (which may require some mechanical change), but we will find out soon. :)

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 16, 2020

Here's a summary of the renaming of the enum fields in RegexParseError, in case that's needed at some point (the original values were internal-only):

Unknown,
AlternationCantHaveComment     --> AlternationHasComment,
IllegalCondition               --> AlternationHasMalformedCondition,
MalformedReference             --> AlternationHasMalformedReference,
AlternationCantCapture         --> AlternationHasNamedCapture,
TooManyAlternates              --> AlternationHasTooManyConditions,
UndefinedReference             --> AlternationHasUndefinedReference,
InvalidGroupName               --> CaptureGroupNameInvalid,
CapnumNotZero                  --> CaptureGroupOfZero,
SubtractionMustBeLast          --> ExclusionGroupNotLast,
NotEnoughParentheses           --> InsufficientClosingParentheses,
TooManyParentheses             --> InsufficientOpeningParentheses,
TooFewHex                      --> InsufficientOrInvalidHexDigits,
UnrecognizedGrouping           --> InvalidGroupingConstruct,
IncompleteSlashP               --> InvalidUnicodePropertyEscape,
MalformedNameRef               --> MalformedNamedReference,
IncompleteSlashP               --> MalformedUnicodePropertyEscape,
MissingControl                 --> MissingControlCharacter,
NestedQuantify                 --> NestedQuantifiersNotParenthesized,
QuantifyAfterNothing           --> QuantifierAfterNothing,
CaptureGroupOutOfRange         --> QuantifierOrCaptureGroupOutOfRange,
ReversedCharRange              --> ReversedCharacterRange,
IllegalRange                   --> ReversedQuantifierRange,
BadClassInCharRange            --> ShorthandClassInCharacterRange,
UndefinedNameRef               --> UndefinedNamedReference,
UndefinedBackref               --> UndefinedNumberedReference,
IllegalEndEscape               --> UnescapedEndingBackslash,
UnrecognizedControl            --> UnrecognizedControlCharacter,
UnrecognizedEscape             --> UnrecognizedEscape,
UnknownUnicodeProperty         --> UnrecognizedUnicodeProperty,
UnterminatedBracket            --> UnterminatedBracket,
UnterminatedComment            --> UnterminatedComment

@danmoseley
Copy link
Member

danmoseley commented Aug 16, 2020

Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks about right but I'll have to look again when I'm at a computer!

@pgovind
Copy link

pgovind commented Aug 16, 2020

Just a note that if you use the tool, you might end up with a bunch of changes in the ref file. We usually just commit the changes we really need and discard the rest.

@abelbraaksma
Copy link
Contributor Author

@pgovind, only now I see the tool, what I did here was by hand. Thanks for the warning though.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 16, 2020

error CS0246: (NETCORE_ENGINEERING_TELEMETRY=Build) The type or namespace name 'RegexParseError' could not be found (are you missing a using directive or an assembly reference?)

This happens in RegexParseErrorTests.cs.

Ok, I kinda expected something like that, but also expected that it is public now. If I add a forced using System.Text.RegularExpressions in that file, it simply says that it is not needed. When I run the tests locally, it just compiles. I wonder: do some tests depend on an existing BCL? Because in that case I need to revert the part where I removed the reflection code.

EDIT: I see it is the runtime (Libraries Build Windows_NT net48 x86 Release). Not sure what to do here.

@am11
Copy link
Member

am11 commented Aug 16, 2020

That is happening only for net48, which I suspected might. :)
I think we would need to #if NET48 in few places in the affected tests.

@abelbraaksma
Copy link
Contributor Author

I think we would need to #if NET48 in few places in the affected tests.

That would work. Thanks!

@abelbraaksma
Copy link
Contributor Author

.packages/microsoft.dotnet.helix.sdk/5.0.0-beta.20407.3/tools/Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(76,5): error : (NETCORE_ENGINEERING_TELEMETRY=Test) Work item cd0ab83a-cbb2-4c93-bec2-e5581e4b7cfb/System.Net.Http.Functional.Tests in job cd0ab83a-cbb2-4c93-bec2-e5581e4b7cfb has failed.
Failure log: https://helix.dot.net/api/2019-06-17/jobs/cd0ab83a-cbb2-4c93-bec2-e5581e4b7cfb/workitems/System.Net.Http.Functional.Tests/console

Next one up, this one I've no clue yet. Guess I'll have to dive in the logs, or it is some fluke...

@am11
Copy link
Member

am11 commented Aug 16, 2020

That could be an intermittent failure, unrelated to PR changes.

@danmoseley
Copy link
Member

As you saw, the tests run on .NET Framework as well purely for comparison. I think #if NET48 should work but existing tests use [SkipOnTargetFramework(TargetFrameworkMonikers.NetFramework)] or if (!PlatformDetection.IsNetFramework)

To verify, you should run tests locally on .NET Framework before pushing that change. This seems to work for me

cd System.Text.RegularExpressions\tests
dotnet build /p:BuildTargetFramework=net472 
dotnet build /t:test     
... later
dotnet build /p:BuildTargetFramework=net50
dotnet build /t:test     

When I try that, the second test run seems to test both (?) .. this should be simpler/better documented, I'll ask about it

A random, unrelated failure like in Http tests can be ignored: no need to run CI again for that. CI runs a vast number of tests/legs and will never be 100% stable.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 16, 2020

As you saw, the tests run on .NET Framework as well purely for comparison. I think #if NET48

Apparently, already part of the tests were in a non-NET48 section, so I ended up just moving the file in the csproj file. Verified locally that they didn't load anymore. Hope that fixes that part.

and will never be 100% stable.

Good to know :)

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 16, 2020

I'm not entirely confident about adding the switch statement for the default message. It kinda feels odd. As mentioned above, there are some messages that expect extra info, which I've left as-is. My estimate is that if someone wants to use the exception-ctor, they are far more likely to do it to have extra control over the message, which we don't give them here anyway. So their only way out is to come up with their own exception.

I wouldn't feel bad if we drop the extra ctor altogether, just saying ;).

I'll look into adding the offset-data to the test coverage. it wasn't done before, but it looks like there was an effort underway to test for offset info, just never fully implemented.

@abelbraaksma
Copy link
Contributor Author

so I assume we can add that safely

@pgovind Do note, that all error messages are already properly constructed right now, except for the public ctor, which isn't used internally.

I'd suggest writing the error message population code once there's new agreement on the proper ctors,otherwise we run the risk doing it twice.

@abelbraaksma
Copy link
Contributor Author

In the interest of time, can you go ahead and make that change anyway (along with populating the error messages with the pattern)?

@pgovind Only read that edit now. Sure, np (and then I sure hope you can convince them ;) )

@pgovind
Copy link

pgovind commented Aug 17, 2020

Alright, I got semi-official sign-off from @GrabYourPitchforks (granted he only went through the following summary) for:

Don't implement 

public RegexParseException(RegexParseError error, int offset);

but instead implement

internal RegexParseException(string pattern, RegexParseError error, int offset);

Is that ok? Do you see any potential issues with this?
 
The pattern is needed to populate the error message correctly. We'd end up with
1) no public constructor on the type
2) Room to add a Pattern property
3) Re-consider making the constructor public in .NET 6

If we can make those changes in time, I'll re-approve and then @danmosemsft can still likely get this in :)

Note: Levi did mention the following that in .NET6, we likely wouldn't want string pattern as the first parameter in a public constructor, but that's an issue for the future with a trivial fix :)

@abelbraaksma
Copy link
Contributor Author

Could one of you look at the comments starting from #40902 (comment) to get an idea of where we are?

To summarize:

  • There's sentiment (@stephentoub) to remove the public ctor public RegexParseException(RegexParseError error, int offset), as it doesn't contain an argument for pattern which is used for each exn message based on a parse error:
  • The idea is to remove it for .NET 5.0
  • And to add a more versatile constructor for .NET 6.0, and add a property string Pattern to contain the error pattern.
  • An alternative solution is to leave it in, for which there are two options:
  1. As-is: which creates a default message which includes the enum
  2. Render an error as close as possible to what an error with that RegexParseError enum would look like if it were thrown internally. We've code in place for this, it would just needed to be merged back in.

@abelbraaksma
Copy link
Contributor Author

(again we posted at the same time 😆 )

Is that ok? Do you see any potential issues with this?

Yes, no problem with that 🙌

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 17, 2020

I'll start implementing the internal ctor. One detail I'd need an agreement on though:

  • I'd prefer to use all enums and where there's a placeholder, either to leave it empty or use ? (question mark)
  • EDIT: chosen this option: Alternative 1: for these 11 or so enums, I can create variant resources without the placeholder
  • Alternative 2: for these 11 enums, we default to the message for Unknown.

@pgovind, what do you think?

@pgovind
Copy link

pgovind commented Aug 17, 2020

Hmmm, maybe I'm not understanding, but if there is a placeholder, you can just use the pattern right?

@abelbraaksma
Copy link
Contributor Author

@pgovind

These ones have no additional info (#40902 (comment)), the others do, like for instance if you have a non-existing unicode property, the error will also show you that wrong property in an additional placeholder.

I don't think it matters much for now, really, as we've decided above to create the internal constructor with the necessary logic, but it won't be called anywhere (the original exceptions remain as they were, of course).

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 17, 2020

I went for the generic version of specific message. I.e., if the resource string requires an extra param:

  <data name="UndefinedNumberedReference" xml:space="preserve">
    <value>Reference to undefined group number {0}.</value>
  </data>

Then I add:

  <data name="UndefinedNumberedReferenceNoPlaceholder" xml:space="preserve">
    <value>Reference to undefined group number.</value>
  </data>

This will also help in the future adding this overload.

@pgovind
Copy link

pgovind commented Aug 17, 2020

Oh now I see what you mean. Yes, I think that's fine :)

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 17, 2020

@pgovind, @danmosemsft, @stephentoub, done. Could you re-review? 😃

I had one small issue with the ref file, it didn't accept the absence of a ctor (it required a default ctor to be there). By adding a private ctor to the ref file, it appeared to be fine and it compiled, but I don't know if that's "the right way".

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 17, 2020

Basically, these last two commits should resolve the "semi-official" agreement with @GrabYourPitchforks / @pgovind above:

Don't implement
public RegexParseException(RegexParseError error, int offset);

but instead implement
internal RegexParseException(string pattern, RegexParseError error, int offset);

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 17, 2020

Looking. Checking it out locally to see if we really need 2 internal constructors.

Not yet: the internal ctor would only be a placeholder for a later public one. It isn't referenced anywhere. The original code that throws the exceptions uses the other internal exception already.

Since there's a difference in behavior: extra data is n/a for the later-to-be-externalized ctor, they should be split. Though the logic could be improved (we could centralize it for all exceptions), I'm not sure that helps much right now.

Copy link

@pgovind pgovind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, re-approved. LGTM.

@danmosemsft , when CI finishes, I'm good with merging this.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 17, 2020

Mono build "Libraries Test Run release mono OSX x64 Debug" gives one of those weird ones again (doesn't seem to be related in any way):

.packages/microsoft.dotnet.helix.sdk/5.0.0-beta.20407.3/tools/Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(76,5): error : (NETCORE_ENGINEERING_TELEMETRY=Test) Work item ed76f7d7-77b1-407c-bb04-d567f901f8a7/System.Drawing.Common.Tests in job ed76f7d7-77b1-407c-bb04-d567f901f8a7 has failed.
Failure log: https://helix.dot.net/api/2019-06-17/jobs/ed76f7d7-77b1-407c-bb04-d567f901f8a7/workitems/System.Drawing.Common.Tests/console

@abelbraaksma
Copy link
Contributor Author

@danmosemsft, all checks past, except that intermittent one above.

@danmoseley danmoseley merged commit e26d013 into dotnet:master Aug 17, 2020
@danmoseley
Copy link
Member

Thanks @abelbraaksma for your efforts. given it only just missed the cutoff, I'm OK porting it when the branch is up.

@danmoseley
Copy link
Member

danmoseley commented Aug 17, 2020

@Anipik is the RC1 branch up and ready for PR's? (I assume we missed the cut)

@abelbraaksma
Copy link
Contributor Author

given it only just missed the cutoff, I'm OK porting it when the branch is up.

Oh, that's too bad, so it won't be in 5.0?

is the RC1 branch up and ready for PR's

Uh wait, you mean it'll be in the release candidate for 5.0, right?

@danmoseley
Copy link
Member

I mean I am OK porting it if we can do it today-ish. It's marginal, but it's pretty hard that it missed by 20 mins.

@abelbraaksma
Copy link
Contributor Author

but it's pretty hard that it missed by 20 mins.

Indeed it is, that's basically just the last commit ;)

I mean I am OK porting it if we can do it today-ish.

That'd be super!!

@Anipik
Copy link
Contributor

Anipik commented Aug 17, 2020

@Anipik is the RC1 branch up and ready for PR's? (I assume we missed the cut)

It made a cutoff, testing a couple of infra changes. But the snap should be in 30 minutes or so.

@danmoseley
Copy link
Member

OK, it's in. Great.

REDMOND+danmose@LAPTOP /c/git/runtime (master)
$ git log -3 --oneline remotes/upstream/release/5.0-rc1 | grep "Regex"
e26d0131b23 Make RegexParseException and RegexParseError public, rename enum fields (#40902)

@abelbraaksma
Copy link
Contributor Author

Thanks for all the help @danmosemsft and all others, couldn't have done it so fast without it!

@abelbraaksma abelbraaksma deleted the runtime-open-regexparseerror branch August 18, 2020 17:09
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make enum RegexParseError and RegexParseException public
7 participants