Skip to content

set qualifiers - feature idea #11

Closed
@mrabarnett

Description

@mrabarnett

Original report by Anonymous.


Some background: I've been working with very large REs in CPython and IronPython. We generate the RE pattern from lists, like lists of cities or lists of names, somewhat like this:

namelist = open("names.txt").read().split()
pattern = re.compile("|".join(namelist))

The one I'm working with now is just a pattern for finding substrings that look like the name of a person. It's overflowing the System::Text::RegularExpressions buffers on IronPython, but works OK with CPython 2.6 on 64-bit Ubuntu.

One of the things I've been thinking is that this kind of pattern should be handled differently. Suppose there was some syntax like

pattern = re.compile("(?S<names>)", names=ImmutableSet(namelist))

where (?S indicates a named ImmutableSet, the members of that set to be drawn from the keyword argument of that name. The compiler would generate a reasonably fast pattern from that set, say the union of all characters in all the strings in the set, and a max and min size based on the min-lengthed and max-lengthed elements of the set. When the engine runs, it would match that fast pattern, and if it matches, it would then check to see if the matched group is a member of the named set. If so, the match would be confirmed; if not, it would fail.

Seems like this might be a useful feature for regex to have, given the popularity of this kind of machine-generated RE.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingminor

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions