Fix GH-19065: Long match statement can segfault compiler during recursive SSA renaming #19083

nielsdos · 2025-07-09T19:23:25Z

On some systems, like Alpine, the thread stack size is small by default. The last step of SSA construction involves variable renaming that is recursive, and also makes copies of their version of the renamed variables on the stack. This combination causes a stack overflow during compilation on Alpine. Triggerable for example with very long match statements.
We previously ran into similar issues, where we also transformed to iterative algorithms, e.g. #14432

A stop-gap solution would be to use heap allocated arrays for the renamed variable list, but that would only delay the error as increasing the number of match arms increases the depth of the dominator tree, and will eventually run into the same issue.

Instead, this patch transforms the algorithm into an iterative one. There are two states stored in a worklist stack: positive numbers indicate that the block still needs to undergo variable renaming. Negative numbers indicate that the block and its dominated children are already renamed. Because 0 is also a valid block number, we bias the block numbers by adding 1.
To restore to the right variant when backtracking the "recursive" step, we index into an array pointing to the different variable renaming variants.

Zend/Optimizer/zend_ssa.c

…cursive SSA renaming On some systems, like Alpine, the thread stack size is small by default. The last step of SSA construction involves variable renaming that is recursive, and also makes copies of their version of the renamed variables on the stack. This combination causes a stack overflow during compilation on Alpine. Triggerable for example with very long match statements. A stop-gap solution would be to use heap allocated arrays for the renamed variable list, but that would only delay the error as increasing the number of match arms increases the depth of the dominator tree, and will eventually run into the same issue. This patch transforms the algorithm into an iterative one. There are two states stored in a worklist stack: positive numbers indicate that the block still needs to undergo variable renaming. Negative numbers indicate that the block and its dominated children are already renamed. Because 0 is also a valid block number, we bias the block numbers by adding 1. To restore to the right variant when backtracking the "recursive" step, we index into an array pointing to the different variable renaming variants.

arnaud-lb

This looks good to me! I have only nits.

arnaud-lb · 2025-08-07T07:12:07Z

Zend/Optimizer/zend_ssa.c

+			/* Push children in enter state */
+			unsigned int child_count = 0;
+			int len_prior = work.len;
+			int j = ssa->cfg.blocks[n].children;


For leaves, can we jump directly to backtrack, and avoid pushing/poping -(n + 1)?

arnaud-lb · 2025-08-07T07:55:35Z

Zend/Optimizer/zend_ssa.c

+			// FIXME: Can we optimize this copying out in some cases?
+			int *new_var;
+			if (ssa->cfg.blocks[n].next_child >= 0) {
+				new_var = emalloc(sizeof(int) * (op_array->last_var + op_array->T));


We could still use alloca here (with a shared limit), as we free new_vars in reverse allocation order. Or an arena.

I think we need only one allocation per depth level, too, as the lifetime of each emalloc() here do not overlap for two blocks of the same depth. This is probably out of the scope of this bug fix however, but maybe we could do something like this:

if (ssa->cfg.blocks[n].next_child >= 0) { int level = ssa->cfg.blocks[n].level; if (save_vars[level] == save_vars[level-1]) { save_vars[level] = alloc(...); } new_vars = save_vars[level]; memcpy(new_vars, save_vars[level-1]); } else { new_vars = save_vars[level]; } // When pushing children: save_vars[level+1] = save_vars[level];

Then we don't need backtracking, and we can remove save_positions. Before leaving the function we only need to free any save_vars != vars.

I'll leave that for master, I don't want to risk making too much complicated changes on stable versions

arnaud-lb · 2025-08-07T08:36:24Z

Zend/Optimizer/zend_ssa.c

+			for (unsigned int i = save_vars_top, p = save_positions[n]; i > p; i--) {
+				efree(save_vars[i]);
+			}
+
+			save_vars_top = save_positions[n];


I think this always frees exactly one save_vars or nothing: save_vars_top will always be either save_positions[n]+1 (when we have allocated vars for this block) or save_positions[n] (when we did not) because any increments when visiting children will have been reverted at this point.

This could be simplified to

if (save_vars_top != save_positions[n]) { ZEND_ASSERT(save_vars_top == save_positions[n]+1); efree(save_vars[save_vars_top]); save_vars_top--; }

save_positions could be replaced by a bitset indicating whether we had allocated save_vars for the block:

if (zend_bitset_in(allocated, n)) { efree(save_vars[save_vars_top]); save_vars_top--; }

We can even avoid a bitset by repeating the creation condition

nielsdos · 2025-08-09T20:59:25Z

@arnaud-lb I've implemented some of your suggestions, can you please double check?

arnaud-lb

Looks good to me!

nielsdos linked an issue Jul 9, 2025 that may be closed by this pull request

Long match statement can segfault compiler during recursive SSA renaming #19065

Closed

github-actions bot added the Category: Optimizer label Jul 9, 2025

nielsdos commented Jul 9, 2025

View reviewed changes

Zend/Optimizer/zend_ssa.c Show resolved Hide resolved

nielsdos force-pushed the fix-19065 branch from efefad7 to 96d11a3 Compare July 9, 2025 19:31

nielsdos marked this pull request as ready for review July 9, 2025 20:56

nielsdos requested a review from dstogov as a code owner July 9, 2025 20:56

nielsdos requested a review from arnaud-lb August 6, 2025 20:56

arnaud-lb approved these changes Aug 7, 2025

View reviewed changes

nielsdos added 2 commits August 9, 2025 21:56

Directly jump to backtrack when possible

e4d878c

Get rid of extra list

ec991a4

arnaud-lb approved these changes Aug 11, 2025

View reviewed changes

nielsdos closed this in 9b86533 Aug 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GH-19065: Long match statement can segfault compiler during recursive SSA renaming #19083

Fix GH-19065: Long match statement can segfault compiler during recursive SSA renaming #19083

Uh oh!

nielsdos commented Jul 9, 2025

Uh oh!

Uh oh!

arnaud-lb left a comment

Uh oh!

arnaud-lb Aug 7, 2025

Uh oh!

arnaud-lb Aug 7, 2025

Uh oh!

nielsdos Aug 9, 2025

Uh oh!

arnaud-lb Aug 7, 2025

Uh oh!

nielsdos Aug 9, 2025

Uh oh!

nielsdos commented Aug 9, 2025

Uh oh!

arnaud-lb left a comment

Uh oh!

Uh oh!

Fix GH-19065: Long match statement can segfault compiler during recursive SSA renaming #19083

Fix GH-19065: Long match statement can segfault compiler during recursive SSA renaming #19083

Uh oh!

Conversation

nielsdos commented Jul 9, 2025

Uh oh!

Uh oh!

arnaud-lb left a comment

Choose a reason for hiding this comment

Uh oh!

arnaud-lb Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

arnaud-lb Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

arnaud-lb Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos commented Aug 9, 2025

Uh oh!

arnaud-lb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!