Description
#include <boost/regex.hpp>
#include <cstddef>
template<std::size_t N0, std::size_t N = N0 - 1>
void tester( char const (&str)[ N0 ] )
{
std::string s(N, '\0');
std::memcpy(s.data(), str, N);
boost::regex rx(s);
boost::match_results<std::string::const_iterator> what;
std::string where(15, 'H');
bool match = boost::regex_match(where, what, rx, boost::match_default | boost::match_partial | boost::match_perl | boost::match_posix | boost::match_any);
(void) match;
}
int main()
{
char const str2[] = "(Y(*COMMIT)|\\K\\D|.)+";
tester( str2 );
}
This regex causes an assertion to be tripped (https://godbolt.org/z/7hYfKx96n):
output.s: /app/boost/include/boost/regex/v5/match_results.hpp:625: void boost::match_results<BidiIterator, Allocator>::maybe_assign(const boost::match_results<BidiIterator, Allocator>&) [with BidiIterator = __gnu_cxx::__normal_iterator<const char*, std::__cxx11::basic_string<char> >; Allocator = std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char*, std::__cxx11::basic_string<char> > > >]: Assertion `base2 >= 0' failed.
Program terminated with signal: SIGSEGV
If you examine the code, this is the offending line: https://github.com/boostorg/regex/blob/boost-1.86.0/include/boost/regex/v5/match_results.hpp#L625
However, don't be so easily fooled here! The actual bug here is quite decoupled from this, this assertion just happens to catch the UB.
The input is: "HHHHHHHHHHHHHHH"
and the regex is /(Y(*COMMIT)|\\K\\D|.)+/
I double-checked the implementation against Perl and we're actually going the correct thing here, but we fail to unwind properly.
The correct regex behavior here is that we match in the middle, on the \\K\\D
which we actually do. The problem is that we don't stop processing the input when we reach the end of the string (even though we should).
This regex only fails because of the (*COMMIT)
we introduce. (*COMMIT)
turns off the match_any
flag which means this branch: https://github.com/boostorg/regex/blob/boost-1.86.0/include/boost/regex/v5/perl_matcher_non_recursive.hpp#L1100
gets hit, and when that happens, we invoke unwind(false)
which is incorrect. We unwind with "match not found" which keeps the parse going and then we match on the wildcard at the end and then from there, all hell breaks loose.
I've tried a few things to fix this behavior but every attempt that fixes this issue breaks a million others in the regression tests.