You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was epic! mbfilter.c is the "heart" of mbstring and was one of the
single biggest source files. Almost every significant function was written
in an wildly verbose and convoluted way. One repeated theme was the use of
explicit state machines which process strings character by character, storing
a state variable between each call. Converting these to 'straight-line' code
cleaned things up enormously.
Another theme was the presence of many trivial code comments stating the
obvious, but almost no comments on the parts which were not obvious. A lot of
detailed comments on the tricky parts have been added to smooth the way for
future developers.
I was worried that all those contortions might have been done in the name of
speed, and that replacing them with clean code might kill performance. Happily,
it turns out that the clean code is much faster.
The behavior is largely unchanged. One misfeature which has been fixed is that
when HTML numeric entities were being decoded, if a hexadecimal entity was
invalid, rather than just passing it through, it would be unnecessarily
uppercased. The handling of invalid MIME "encoded words" when decoding MIME
headers is also a bit different.
Interestingly, while I was working on this refactoring, Nikita Popov fixed a
bug in mb_strimwidth. It turns out that this refactoring would also have fixed
the same bug (N. Popov's new test passed on the refactored code without any
changes).
Cases where it is necessary to count characters backwards from the end of
a UTF-8 or UTF-16LE string have also been optimized. There is still more
juicy, low-hanging fruit for performance optimization, however.
0 commit comments