Skip to content

Proposal: Matching expressions #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gregglind opened this issue Jul 23, 2013 · 4 comments
Closed

Proposal: Matching expressions #7

gregglind opened this issue Jul 23, 2013 · 4 comments

Comments

@gregglind
Copy link
Contributor

This might go beyond the original spec, or might belong in a different project :)

What if path bits allowed a limited set of expressions:

$[0 OP value] where OP = [==, >, <, >=, <=, !=, =~] (=~ -> regex on repr)

There are some gross side effects:

  • integer vs string values for value
  • more complicated parsing

(The dirtiness and complication of this partly depends on what OP's are allowed.)

This capability exists in http://goessner.net/articles/JsonPath/ . I have also found it useful in other projects.

@kennknowles
Copy link
Owner

I am OK with the idea of expressions like these as long as they remain portable and have a clear definition.

I'm OK with overloading brackets or using the ?() syntax from the old blog post, but perhaps you can suggest an idea that uses a backtick-enclosed "named operator"?

Comments on your operators:

  • I think == and != have obvious semantics with no surprises: two things are equal exactly when they are the same value, and otherwise they are not equal.
  • Regex is not as good, since there are so many regex standards, but still OK: Just choose a basic regex standard and non-strings simply do not match.
  • Ordering is getting too complex for JSONPath. There are only two consistent ways I know of to do ordering in a dynamic setting, and I don't think either are a good match:
    1. Throw an exception when the two things have different types.
    2. One giant ordering for all values.

Of course, there are inconsistent ways to do it, but that kind of sacrifice needs very compelling justification. Do you have particular uses in mind or is this just a discussion because you are interested?

@gregglind
Copy link
Contributor Author

Thanks for thinking about this! I am impressed that you are 'doing it right' on the lexer / parser, and am glad to hear your feedback on the idea!

  1. These could be treated at 'advanced features' in the way that JSONSelect https://github.com/lloyd/JSONSelect (and http://jsonselect.org/#docs/levels) has 'levels'. Given the inherent mismatches between JSON and Python, and between 'strings' and other types, it's hard to know how to both 'get it right' and have it not be as verbose and 'overwrought' as XML.
  2. When implementing this before, I actually found that == was actually quite gnarly, in that it implied casting (in the parser).... think: parse("b.field==1") . Should that match as a string? A repr? Both?
  3. Regex - PCRE is probably not expected
  4. I am fine with throwing out ordering, but it's easyish to implmement if one sticks to 'singleton' types. I can definitely see real use cases (JS timestamps in a period, for example).
  5. The ?( ) has precedent from bash. What does / should evaling in there look like though? Should it assume js semantics? Ie., is it $(len(this) > 0) ? Seems like it pushes off the DSL one more level.

My own crude implementation of all this is at https://gist.github.com/gregglind/6066375#file-filter_mapper-py-L30-L49 . In this script, this was mostly about 'match' (and return the whole item), not 'only return subsections'

The original impetus for running filtering Hadoop jobs, but a minilang against JSON is compelling. Being able to say "Must match this form AND/OR give me these parts" using the same DSL is very compelling. Having 'filter expressions' removes the annoying try/except and type mismatch dances that make for tedious interactions with JSON in Python!

Some real examples from my own usage:

"-1.data.group=~command"    # field [4]["data"]["group"].search('command') is True
"4==129294922222"

@kennknowles
Copy link
Owner

  1. Regarding ==, if I understand your syntax:
  • b.field==1 returns True for all dicts where b.field is the integer 1 and False for all others.
  • If you wished to match the string "1" you would use the JSONPath b.field=="1".

Embedding all of JSON syntax into this simple language is unfortunate, so you have just about convinced me it is better to do this filtering in a layer on top of this.

  1. Regarding regexes:
  • These now look fairly simple and reasonable. Regexes themselves are a concrete syntax so enclosing them in quotes is the easiest way to embed them.
  • The grammar should make it unambiguous whether a quoted string is a field name or not.
  1. Regarding type mismatch dances:
  • Probably useful to have type filtering.
  1. Regarding eval aka embedding a scripting language into JSONPath:
  • Once you include a full scripting language, the benefit of simplicity is lost along with interoperability and clear semantics. I won't do this, but almost all the useful cases are handled by a simple expression language.

I believe that the most productive additions, relative to the amount of unresolved issues, would be type filtering and regexes. This all bears some contemplation.

@kennknowles
Copy link
Owner

Merging with #8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants