Skip to content

add "ann" as reserved keyword #2005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 4.x
Choose a base branch
from

Conversation

Hazel-Datastax
Copy link

I found a corner case when using Data API (stargate/data-api#1806). I cannot use ann as my table name, but I can use it in CQL:

cassandra@cqlsh:default_keyspace> CREATE TABLE default_keyspace."ann" (t text PRIMARY KEY,v VECTOR<float,5>);

The reason is, inside the Java Driver, it has a set that contains all the reserved keywords. When the query builder builds the create table query, it will call tableName.asCql(true). Inside asCql(true) method, it will check if the string is in the reserved keywords set and double quoted if it’s in. Unfortunately, the set doesn’t contain ann.

I guess ann was introduced later and the keywords set hasn't been updated accordingly.

@absurdfarce
Copy link
Contributor

Good catch @Hazel-Datastax! We actually had to address something very similar to this for dsbulk. Should've occurred to me this part of the Java driver might have an issue as well.

@absurdfarce
Copy link
Contributor

absurdfarce commented Apr 25, 2025

So, there's definitely something weird going on here.

In Apache Cassandra 5.x "ann" is very definitely an unreserved keyword. The CQL docs in the Cassandra repo talk about the distinction a bit; reserved keywords can never be used as an identifier while unreserved keywords can in some situations... but those situations aren't specified. If an unreserved identifier is used in a spot that might introduce conflict it presumably would have to be quoted... but it's not clear how the driver can identify such a situation.

The dsbulk change I referenced above doesn't need to worry about this distinction. It includes it's own ANTLR-derived parser (a subset of what's actually used in Cassandra) so it can identify these keyword cases using (essentially) the same grammar Apache Cassandra uses.

I also note that the set "ann" is added to in this PR is explicitly for reserved keywords; note that each member of that set is a reserved keyword (as defined in the CQL docs above) and that no unreserved keywords are included. Presumably that's true because the code can always quote reserved keywords when generating CQL strings... but unreserved keywords are a bit tricker.

To make it even worse: I note the following against Apache Cassandra 5.0.0:

cqlsh> describe keyspace test;

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
cqlsh> CREATE TABLE test.filtering (i int PRIMARY KEY, j float);
cqlsh> CREATE TABLE test.ann (i int PRIMARY KEY, j float);
cqlsh>

The string "ann" works just fine as a table name there. But when I try something similar on Astra I get results similar to what I think you're describing:

token@cqlsh> CREATE TABLE janus.filtering (i int PRIMARY KEY, j float);
token@cqlsh> CREATE TABLE janus.function (i int PRIMARY KEY, j float);
token@cqlsh> CREATE TABLE janus.ann (i int PRIMARY KEY, j float);
SyntaxException: line 1:23 mismatched character '(' expecting set null
token@cqlsh> CREATE TABLE janus.”ann” (i int PRIMARY KEY, j float);
Invalid syntax at char 20
  CREATE TABLE janus.”ann” (i int PRIMARY KEY, j float);
                     ^
token@cqlsh> CREATE TABLE janus.’ann’ (i int PRIMARY KEY, j float);
Invalid syntax at char 20
  CREATE TABLE janus.’ann’ (i int PRIMARY KEY, j float);
                     ^

So we've clearly got inconsistencies in the behaviour here between Astra and Apache Cassandra. But to make matters worse Astra is internally inconsistent: some unreserved keywords (such as "filtering" and "function") are just fine to use as table names while I can't get "ann" to be used as a table name whether I quote it or not.

@absurdfarce
Copy link
Contributor

@adutra @aratno @tolbertam I'm curious about what you guys think of this. Short version:

  • C* now defines "ann" and "vector" as unreserved keywords
  • The driver only has logic to specifically quote strings containing reserved keywords... since those need to be quoted in all cases when used in queries
  • Unreserved keywords need to be quoted in some situations but not others

My current thinking is that there isn't really much we can do here. Without better guidance as to when unreserved keywords should be quoted or not the Java driver can't really interject so it's up to the user to quote unreserved keywords when appropriate. If you have a full-blown CQL parser you could do better (see the referenced dsbulk issue above) but short of that you're kind of limited.

Thoughts?

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

The token K_ANN has been added to the "basic unreserved keywords" rule by this commit: apache/cassandra@949b760.

I agree that unreserved keywords lack of a clear, well-defined meaning, but in any case, they can be table identifiers since the cfName rule explicitly allows unquoted unreserved keywords:

cfName[QualifiedName name]
    : t=IDENT              { $name.setName($t.text, false); }
    | t=QUOTED_NAME        { $name.setName($t.text, true); }
    | k=unreserved_keyword { $name.setName(k, false); }
    | QMARK {addRecognitionError("Bind variables cannot be used for table names");}
    ;

So, I agree with @absurdfarce and I don't think it's correct to add ann to the reserved keywords set.

About Astra vs C* 5.0 observed differences:

  • If CREATE TABLE janus.ann is failing with Astra, I would investigate if Astra has the ann keyword added to the basic unreserved keywords rule – because otherwise this statement looks good to me and ann should pass the cfName rule.
    • In fact... it's even more mysterious than that, because if K_ANN is absent from the grammar, then the input ann should pass the IDENT rule, so the statement would still be correct. I wonder if Astra isn't using a very different grammar rule for table names? Can someone check?
  • I'm also surprised about this one: CREATE TABLE janus.”ann”. If this is failing, I wonder if this isn't because @absurdfarce used curly double quotes ( U+201D) instead of regular ones (" U+0022). @absurdfarce could you please confirm?

But in any case, and until we get more insights, the Astra behavior does not invalidate the fact that ann is not a reserved keywork and should be usable as a table identifier, without quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants