Skip to content

Conversation

FurcyPin
Copy link

@FurcyPin FurcyPin commented May 23, 2025

This PR fixes #809.

Looking at the code, I realized that a similar issue was already solved: #784

The tricky part is that BEGIN can be used in two ways:

Case 1
The first way is as a BEGIN TRANSACTION statement:

(For instance, this is valid in MySQL or Postresql, and in this case we do want to split)

begin;
update foo
       set bar = 1;
commit;

Case 2
The second way is as a BEGIN ... END statement when creating a stored procedure, function or trigger.
#784 solved the bug, but only in the case of a CREATE TRIGGER statement, like this one.

CREATE TRIGGER IF NOT EXISTS remove_if_it_was_the_last_file_link
-- Delete the direntry when is removed it's last static link
    AFTER DELETE ON links
    WHEN NOT EXISTS
    (
        SELECT * FROM links
        WHERE child_entry = OLD.child_entry
        LIMIT 1
    )
BEGIN
    DELETE FROM dir_entries
    WHERE dir_entries.inode = OLD.child_entry;
END;

The problem is that each SQL dialect has its own syntax (e.g. WITH FUNCTION in Trino, CREATE OR REPLACE PROCEDURE in BigQuery, etc.)

Luckily, I found a simple way to tell the two cases apart:

If BEGIN is the first word of the statement, then you are in case 1, otherwise, it's case 2.

There is one edge case with PostgreSQL, though.
For instance, this is valid in psql :

CREATE FUNCTION doubledollarinbody(var1 text) RETURNS text
LANGUAGE plpgsql
AS $_$
DECLARE str text;
  BEGIN
    str = $$'foo'$$||var1;
    execute 'select '||str into str;
    return str;
  END
LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

But according to my rule, the first semicolon should not be ignored.
Luckily, it seems you already handled this case because here, everything between the $_$ keywords is treated as one special block.

I had one test failing, the one with this query :

CREATE OR REPLACE FUNCTION foo(
	p_in1 VARCHAR
	, p_in2 INTEGER
) RETURNS INTEGER AS

  DECLARE
	v_foo INTEGER;  
  BEGIN
  	SELECT *
  	FROM foo
  	INTO v_foo;
  	RETURN v_foo.id;
  END;

But I suspect that this statement is incorrect.
I checked a few dialects (including MySQL and SQL Server) and I could not find any one besides PostgreSQL where the DECLARE statement is before the BEGIN, not inside. And as we just saw, PostgreSQL requires a $$ keyword.

So I edited this test file to put the DECLARE inside the BEGIN ... END block.
I also added a few more tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split should ignore semicolons which are inside BEGIN ... END statements
1 participant