-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
SyntaxError on HDF queries where right-hand contains string delimiter #6901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You know, I think it might be as simple as adding a This works:
|
this is fixed in 0.13.1 iirc in any event you should use the new syntax in your queries rather than Term directly http://pandas.pydata.org/pandas-docs/stable/io.html#querying-a-table |
This isn't and never was a bug. The query contains unmatched quotes therefore it's invalid syntax |
There was an issue with strings containing |
The first and second case are really the same thing. If you were to write them outside of the query string they would be the following (invalid) code: index == Al Lawson' The third works because it results in index == """Al Lawson'""" And the fourth doesn't work because you have the following invalid code index == '''Al Lawson'''' notice there are 3 quotes on the lhs and 4 on the rhs |
So the user is responsible for doing
|
@ariddell Yep. Or |
Ok. Thanks! |
@ariddell it's impossible for the parser to know that you meant an exact string (and not just an error) if u don't quite it just like python syntax (which is doing the parsing) |
@jreback u stole my ⚡ ! :) |
Might be nice to have a note in the docs? (especially since it worked in 0.12) |
@ariddell Agreed! What did it do there? automatically quote? care to do a doc PR? |
It must have automatically done the |
@ariddell Ok np. I'll take this one then. |
Reopening for doc issue |
yah I guess just make a mention that u need to quote strings (although all strings are quoted in examples) - it IS python syntax - but I suppose u can clarify |
I just realized that none of this is even necessary if you just pass in the variable that you're interested in
I'll make a doc note about this, but there's really no need to mention quoting |
Good to know. I have to say, I'm a little concerned about grabbing the contents of local variables by name. It strikes me as something borrowed from R. In any event, thanks for your work on this. |
@ariddell What are you concerned about? I'm happy to explain any of the machinery to you. There's actually a full parse of the query string into something like expr = Eq(lhs=Term('index', df.index), rhs=Term('title', "Al Lawson'")) which allows pandas to do some alignment if necessary but ultimately it's passed to PyTables (which passes to |
numexpr seems like a special case -- since it's departing from the land of Python. But I suppose that's true for PyTables too. What causes me concern is just the "principle of least surprise" -- in Python you never refer to a local variable in a string. |
If your index is a string that contains a
'
or"
there is potentially no consistent way to do an HDF query from disk. The following is real-ish data from the Wikipedia pagecounts dataset. Worked in 0.12, SyntaxError in 0.13.Easy to reproduce:
This is
test.csv
The text was updated successfully, but these errors were encountered: