Skip to content

Commit d5235cb

Browse files
committed
Major refactoring and optimization of mbfilter.c
This was epic! mbfilter.c is the "heart" of mbstring and was one of the single biggest source files. Almost every significant function was written in an wildly verbose and convoluted way. One repeated theme was the use of explicit state machines which process strings character by character, storing a state variable between each call. Converting these to 'straight-line' code cleaned things up enormously. Another theme was the presence of many trivial code comments stating the obvious, but almost no comments on the parts which were not obvious. A lot of detailed comments on the tricky parts have been added to smooth the way for future developers. I was worried that all those contortions might have been done in the name of speed, and that replacing them with clean code might kill performance. Happily, it turns out that the clean code is much faster. The behavior is largely unchanged. One misfeature which has been fixed is that when HTML numeric entities were being decoded, if a hexadecimal entity was invalid, rather than just passing it through, it would be unnecessarily uppercased. The handling of invalid MIME "encoded words" when decoding MIME headers is also a bit different. Interestingly, while I was working on this refactoring, Nikita Popov fixed a bug in mb_strimwidth. It turns out that this refactoring would also have fixed the same bug (N. Popov's new test passed on the refactored code without any changes). Cases where it is necessary to count characters backwards from the end of a UTF-8 or UTF-16LE string have also been optimized. There is still more juicy, low-hanging fruit for performance optimization, however.
1 parent 9c330a3 commit d5235cb

File tree

6 files changed

+1149
-2200
lines changed

6 files changed

+1149
-2200
lines changed

0 commit comments

Comments
 (0)