-
-
Notifications
You must be signed in to change notification settings - Fork 314
Proposal: Use I-Regexp instead of ECMA-262 #1327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like the idea. I'm assuming our first stable release will come before this is stable, so I'm wondering if we can adopt this later without breaking compatibility? We might need to consider some wording changes to make space for a future change like this, but I think we can make it work. |
This sounded like a good idea until I saw this:
Unless I'm missing something that means that I-Regexps are anchored at both ends by default, which would be a huge breaking change. That's a completely different mindset for thinking about regexes, and it's not how they work in Perl, Python, Ruby, or JavaScript. Please correct me if I'm misunderstanding here, but if this is accurate I'm a no. |
I'll bring that up, @handrews, and report back. Good catch. |
[EDIT: Never mind this comment, I was reading things wrong] |
iregexp is for matching regular expressions, which in practice always require anchors (except for rather unusual cases). Since there is no consensus on anchors, the best approach is to leave out this redundant noise -- note that this is not an innovation as XSD regexps have always done this. |
@cabo the issue at hand is not one of requiring the most interoperable regexes. Schema authors can decide their own tradeoffs for that. The issue is whether to break a feature that has had the same behavior (in terms of anchoring and using ECMA as a reference) since the very beginning of JSON Schema. |
Sure. I don't have an opinion on that. Of course, you can standardize on bracketing an iregexp with ^$ for backwards compatibility (I don't think anyone will notice the fact that "." does not include LS/PS in ECMAscript but does in iregexp). You can also use a different JSON member name to introduce iregexps, but that doesn't help you with the old JSON member name. |
Oh yeah, I agree, that's why I edited out the comment about "." - I had just read that part wrong. I-Regexp doesn't do what I was worried about with that comment anyway. But right now However, an While there are always a few people who think regexes should be anchored by default (presumably coming from the XSD world), I'd say the vast majority of people who use |
Hi @handrews , in the IEC standards that are pending publication I know that REGEX patterns are utilized extensively for dates (i.e. date, datetime, duration, month, etc.) to ensure that the string representations are expressed as their respective ISO 8601 compliant equivalents. This since dates are not native primitives in JSON. I recall extensive testing with the French N.C. that I will need to double check on to see where that landed once again. Sorry as I just bumped into this thread now. ~Todd |
@tviegut this isn't about regular expressions in general. We want to use and currently are using regex. That's not in question. This issue is about which flavor of regex we want to support in the spec. Currently, we have ECMA-262, but support for that is inconsistent across languages/platforms. We need something that has guaranteed interoperability. |
@gregsdennis I was reading @tviegut's comments as indicating what sort of things might break / need to be updated if we made this change. @tviegut we do often recommend using regexps for date verification since @gregsdennis I'm curious about what environments can't (as opposed to just currently don't) support ECMA-262. In particular the anchor thing, since we could easily say "if you anchor your regexes and don't use |
My suggestion isn't based on personal experience, but more on recollection of complaints from others regarding ECMA-262 support in their language of choice. It's not always 100%. .Net, for example, has a 262-compliant mode, but it doesn't support some cases (that I can't specifically recall). |
@handrews yes that's why Now, I did some further digging since I posted earlier. In reviewing the background further, turns out we had reps from both the U.S. and French teams thoroughly vett out and test the REGEX-s that were going to be published as part of the standard. Now, I see contributed to the draft the following: |
@tviegut / @admin-cimug I don't want this getting off-topic. The point of this issue isn't "are regexes useful?" The point is determining the best regex specification. To that end, I don't think your comments add to that discussion. If your concern is follow-on specifications which repeat what JSON Schema states, I can't advise on a process to address that. JSON Schema continues to evolve, and as such, references like this will need to update. |
@gregsdennis : No, the context of my comments wasn't if they're useful. That's self evident :). Rather the context was in response to @handrews statement to you:
(also I apologize as @tviegut is my personal account from my GitHub app and @admin-cimug an SDO related account) |
@tviegut thanks for the clarification. The idea behind this is that i-regexp is supposed to be largely compatible with existing libraries, so things shouldn't break in practice. Basically, we'd be reducing the set of guaranteed expressions that are supported, but we're not restricting libraries from supporting additional expressions. We'd just be saying that those additional expressions wouldn't be guaranteed to be interoperable. I hope that makes sense. |
Right. But if it's in truth only directly compatible with XSD due to the implicit anchoring and incompatible with ECMA (and Perl, Python, Ruby, etc.), which is the ecosystem to which JavaScript and therefore JSON and therefore JSON Schema belong, it's not going to be of use to us. Not in the standard |
Just a quick thought here: When reading iregexp (or especially ipattern) above, my first reaction was that this might be understood as case-insensitive variant of pattern (i.e. like the |
@m-mohr good point- naming is hard! If we do add new keywords, they will get their own issues for discussion first, so we don't need to sort that out here. But I'm glad you brought it up! |
Given the resistance I've received when I asked about i-regexp being based on XSD (and this the implicit anchors), I'm no longer sure this is a good fit for us. I've made the argument that explicit anchors are more common and more well-known by developers, and they're not listening. I'm happy to close this if others are. Thanks for looking into it. |
Yeah I've weighed in over there but they seem dead-set on ignoring the ecosystem that they're allegedly targeting so I'll probably give up pretty soon. It's baffling. I can't possibly see a justification to make an unintuitive breaking change to JSON Schema regexes that runs counter to the vast majority of JSON/JavaScript/ECMA technologies and the programming languages that most often parse them, particularly not as we're trying to emphasize stability. |
Henry, iregexp was not designed for json-schema.org. For json-schema.org, the question what kinds of regexps you want to use is pretty much moot, as that ship has sailed. I don't understand why this needs to be discussed. iregexp won't "just drop in" in json-schema.org's The contribution that iregexp can make here is that it provides a well-defined subset (intersection) that actually is widely interoperable, well beyond the JavaScript ecosystem. This subset needs a bit of translation to work with sliding regexps and explicit anchors, but it is still a useful subset. I would be way more interested in whether that subset hits your requirements than in the current discussion. |
No one is making this claim. What's baffling is that iregexp is being created as an interoperable standard with the intention of being usable by other specifications, yet it's ignoring the ecosystem it claims to target.
But it's being developed specifically for JSON Path, an obvious JSON (thus JavaScript ecosystem) technology. If it's not a good fit for JSON Schema, I argue that it's not a good fit for JSON Path for the same reasons. |
Very much not so. The main argument for iregexp is that we do not need a new regex dialect per application environment. iregexp is a good fit for JSONPath, which is why we are completing the work in the JSONPath WG. But the intention is for this spec to have wider application. I understand that the json-schema.org people have adopted the ECMAScript dialect long ago, so going for a more general approach may seem unnatural here. I'm sorry, but that doesn't have a bearing on whether iregexp is a good fit for JSONPath. |
We'll continue this argument in other channels. |
I have been convinced that iregexp is not a good fit for JSON Schema. |
Very interesting subject, maybe someone can tell me which is the current way to include multiline matching in a string pattern as the page from http://json-schema.org/understanding-json-schema/reference/regular_expressions.html does mention that Does this mean that the specified pattern is impossible to match multiline strings? |
(in ECMAscript, that is the "dotAll" feature, triggered with the "s" flag.
What do you mean by "the specified pattern"? The example on the page you reference deliberately does not match newlines. The ECMAscript flavor has indeed been designed under the assumption that flags can be specified with the regexp. |
\s should be a workaround but at least in python-jsonschema it does not. |
The classical way to emulate dotAll is a character class that combines positive and negative escapes, e.g., |
The IETF JSON Path working group is putting together I-Regexp (authored by @cabo) intended as an interoperable subset of the various flavors of regular expressions. Think of it as a "capabilities intersection." The idea is that the spec defines only features which are known to be supported by the majority of existing regular expression libraries, which means that most libraries should be able to claim conformance without actually having to make any changes.
I would like to propose that we use that here instead of ECMA-262, which seems to have varying support across languages. It would likely reduce the set of guaranteed-supported expressions, but at least we would be able to legitimately claim some guaranteed support (which I don't think we can do right now).
This spec is still in draft phase and continues to evolve. I'm not too rushed on getting this in.
The text was updated successfully, but these errors were encountered: