-
Notifications
You must be signed in to change notification settings - Fork 584
Commit 3ae8ec4
committed
regcomp.c: Change handling of filled EXACT nodes
This changes the detection mechanism to check just before writing to see
if if would be out of bounds, and if so, instead break out of the loop,
and go close out the node. Prior to this commit space for a worst-case
scenario was reserved, and we didn't start a new character if we were in
that danger zone. This left nodes left fully packed than they could
have been.
Thus this improves the packing of nodes, especially under /i, from the
previous mechanism. But more importantly, it set things up so that we
can potentially increase the node size as we go along.
This also changes the handling of avoiding splitting a multi-character
fold across nodes under /i. For example, take the sequence 'ffi', We
wouldn't want to end a node with 'ff', when the first character in the
next node is an 'i', as U+FB03 folds to that sequence, and the code that
does pattern matching can't currently match across node boundaries.
Previously we backed off filling the node until the final character
wasn't one that could potentially cause such a break. That is we didn't
look at the next character and see if it was an 'i' (or some other
potential multi-char fold.) Now we do look at that next
character(s), and only back off if this actually would split a real
multi-char fold.1 parent c45abc0 commit 3ae8ec4Copy full SHA for 3ae8ec4
0 commit comments