Skip to content

Conversation

richardantal
Copy link
Contributor

No description provided.

@richardantal richardantal requested review from stoty and virajjasani and removed request for stoty July 14, 2025 14:12
@richardantal
Copy link
Contributor Author

Probably I need to add more tests for this functionality.
Please review it, I would like to know what do you think about it.

@virajjasani
Copy link
Contributor

This could be a nice feature!
I am curious about the use case: Is this required for auto-commit connections from sqlline?

@tkhurana
Copy link
Contributor

@richardantal Is there performance gain from using multiple values in a single query ? I don't think so since we can achieve the same by batching multiple upsert statements and then doing a commit.

conn.createStatement().execute("UPSERT INTO " + tableName + "(K, INT, INT2) VALUES ('E', 5, 5),('F', 61, 6)");
conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES ('C', 31, 32),('D', 41, 42)");
conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES ('G', 7, 72),('H', 8)");
conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES ('I', 9),('I', 10)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is valid standard SQL (i.e. not all columns are specified, but no column list either) , but if this works for the single value upsert then it's fine.

}

@Test
public void testValidMultipleUpsert3() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more failure cases to exercise the parser would be nice.
i.e. no value list at all, no commas between, not closing some parentheses and similar.

conn.createStatement().execute(ddl);


conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES ('A', 11, 12)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to also have some tests for expressions and mixed expressions and literals.
i.e.
VALUES (substring('APPLE',0,1), 2*2)

+ constantExpression.toString() + " in column " + column);

int index = -1;
for (byte[][] valuesListIem : valuesList) {
Copy link
Contributor

@stoty stoty Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the two lists in sync like this awkward.
Using an old-style for() would be more readable to me:
for(int index=0; index<valuesList.length();i++)
and then use the index to get both the valuesListItem and constantExpression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a typo in valuesListIem

@stoty
Copy link
Contributor

stoty commented Jul 15, 2025

@richardantal Is there performance gain from using multiple values in a single query ? I don't think so since we can achieve the same by batching multiple upsert statements and then doing a commit.

My take:

  • Avoids network RTT for Query Server
  • Avoids multiple query parse and compilation phases

also:

  • Adds an previously unimplemented SQL feature.
  • Improves compatibility with existing SQL tools that expect thisfeature (though in reality, most of those tools don't handle upserts anyway)

@richardantal
Copy link
Contributor Author

Thanks for the reviews, I added a bit more tests

I also uploaded a simple performance test for this feature to the ticket as an attachment.
For me, inserting simple rows took 65 sec, inserting 2 at once 59, 4: 57, 8: 56 (With the same number of commits)
So there is a minor performance improvement using this feature but it is not the major point of this feature.

@stoty
Copy link
Contributor

stoty commented Jul 16, 2025

10% improvement for bulk loading is nothing to sneeze at @richardantal .

Copy link
Contributor

@stoty stoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

: UPSERT (hint=hintClause)? INTO t=from_table_name
(LPAREN p=upsert_column_refs RPAREN)?
((VALUES LPAREN v=one_or_more_expressions RPAREN ( ON DUPLICATE KEY ( ig=IGNORE |
((VALUES LPAREN e = one_or_more_expressions {v.add(e);} RPAREN (COMMA LPAREN e = one_or_more_expressions {v.add(e);} RPAREN )* ( ON DUPLICATE KEY ( ig=IGNORE |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Would extracting LPAREN e = one_or_more_expressions {v.add(e);} RPAREN imto a function make sense ?

@richardantal
Copy link
Contributor Author

I rebased the change to resolve conflict with spotless change.
Lets wait for a test run

@stoty
Copy link
Contributor

stoty commented Oct 9, 2025

The last CI run still shows spotless issues.
Is there anything blocking this @richardantal ?

@richardantal
Copy link
Contributor Author

I rebase the change
It shouldn't contain spotless issues anymore

@virajjasani
Copy link
Contributor

Given the grammar changes, we want this only for 5.4.0 right?

}
boolean isAutoCommit = connection.getAutoCommit();
if (valueNodes == null) {
if (valueNodesList.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not also consider valueNodesList being null? like we consider above with the for-loop?

@richardantal
Copy link
Contributor Author

Thanks Viraj for the question, it is a good one.
In the PhoenixSQL.g I added an init phase for the valueNodesList.
I think it can not be null.
https://github.com/richardantal/phoenix-1/blob/PHOENIX-7198/phoenix-core-client/src/main/antlr3/PhoenixSQL.g#L868

I tweaked the above condition a little, to be closer to what we had earlier.

targetColumns.add(rowTimestampCol);
if (valueNodes != null && !valueNodes.isEmpty()) {
valueNodes.add(getNodeForRowTimestampColumn(rowTimestampCol));
if (valueNodesList != null) {
Copy link
Contributor Author

@richardantal richardantal Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this check is not needed, but better to be safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants