Skip to content

assertion failure with crafted regex #232

Closed
@cmazakas

Description

@cmazakas
#include <boost/regex.hpp>
#include <cstddef>

template<std::size_t N0, std::size_t N = N0 - 1>
void tester( char const (&str)[ N0 ] )
{
   std::string s(N, '\0');
   std::memcpy(s.data(), str, N);
   boost::regex rx(s);
   boost::match_results<std::string::const_iterator> what;
   std::string where(15, 'H');
   bool match = boost::regex_match(where, what, rx, boost::match_default | boost::match_partial | boost::match_perl | boost::match_posix | boost::match_any);
   (void) match;
}

int main()
{
   char const str2[] = "(Y(*COMMIT)|\\K\\D|.)+";
   tester( str2 );
}

This regex causes an assertion to be tripped (https://godbolt.org/z/7hYfKx96n):

output.s: /app/boost/include/boost/regex/v5/match_results.hpp:625: void boost::match_results<BidiIterator, Allocator>::maybe_assign(const boost::match_results<BidiIterator, Allocator>&) [with BidiIterator = __gnu_cxx::__normal_iterator<const char*, std::__cxx11::basic_string<char> >; Allocator = std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char*, std::__cxx11::basic_string<char> > > >]: Assertion `base2 >= 0' failed.
Program terminated with signal: SIGSEGV

If you examine the code, this is the offending line: https://github.com/boostorg/regex/blob/boost-1.86.0/include/boost/regex/v5/match_results.hpp#L625

However, don't be so easily fooled here! The actual bug here is quite decoupled from this, this assertion just happens to catch the UB.

The input is: "HHHHHHHHHHHHHHH" and the regex is /(Y(*COMMIT)|\\K\\D|.)+/

I double-checked the implementation against Perl and we're actually going the correct thing here, but we fail to unwind properly.

The correct regex behavior here is that we match in the middle, on the \\K\\D which we actually do. The problem is that we don't stop processing the input when we reach the end of the string (even though we should).

This regex only fails because of the (*COMMIT) we introduce. (*COMMIT) turns off the match_any flag which means this branch: https://github.com/boostorg/regex/blob/boost-1.86.0/include/boost/regex/v5/perl_matcher_non_recursive.hpp#L1100

gets hit, and when that happens, we invoke unwind(false) which is incorrect. We unwind with "match not found" which keeps the parse going and then we match on the wildcard at the end and then from there, all hell breaks loose.

I've tried a few things to fix this behavior but every attempt that fixes this issue breaks a million others in the regression tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions