Skip to content

Commit 3ae8ec4

Browse files
committed
regcomp.c: Change handling of filled EXACT nodes
This changes the detection mechanism to check just before writing to see if if would be out of bounds, and if so, instead break out of the loop, and go close out the node. Prior to this commit space for a worst-case scenario was reserved, and we didn't start a new character if we were in that danger zone. This left nodes left fully packed than they could have been. Thus this improves the packing of nodes, especially under /i, from the previous mechanism. But more importantly, it set things up so that we can potentially increase the node size as we go along. This also changes the handling of avoiding splitting a multi-character fold across nodes under /i. For example, take the sequence 'ffi', We wouldn't want to end a node with 'ff', when the first character in the next node is an 'i', as U+FB03 folds to that sequence, and the code that does pattern matching can't currently match across node boundaries. Previously we backed off filling the node until the final character wasn't one that could potentially cause such a break. That is we didn't look at the next character and see if it was an 'i' (or some other potential multi-char fold.) Now we do look at that next character(s), and only back off if this actually would split a real multi-char fold.
1 parent c45abc0 commit 3ae8ec4

File tree

2 files changed

+292
-153
lines changed

2 files changed

+292
-153
lines changed

0 commit comments

Comments
 (0)