Skip to content

Monitor regex (monitor_re) has a minor regex bug #2942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
beasteers opened this issue Sep 13, 2023 · 1 comment · Fixed by #2950
Closed

Monitor regex (monitor_re) has a minor regex bug #2942

beasteers opened this issue Sep 13, 2023 · 1 comment · Fixed by #2950

Comments

@beasteers
Copy link
Contributor

beasteers commented Sep 13, 2023

Version: redis-py: 3.5.3

Platform: Python 3.10.11 mac

Description:

The redis monitor command parsing can break for certain messages (has been happening with byte strings that contain the substring "] ").

Basically, the fix is to make the client_info regex non-greedy:

# original
monitor_re = re.compile(r"\[(\d+) (.*)\] (.*)")
# fix:
monitor_re = re.compile(r"\[(\d+) (.*?)\] (.*)")

You can test here: https://regexr.com/

An example that works

[0 172.18.0.4:36030] "XADD" "default:main" "MAXLEN" "~" "1000" "*" "d" "asdf"

An example that breaks the original regex

[0 172.18.0.4:36030] "XADD" "default:main" "MAXLEN" "~" "1000" "*" "d" "] "

The assumption being that the client_info would never contain "] " which seems to be the case from what I can tell.

@beasteers
Copy link
Contributor Author

beasteers commented Sep 16, 2023

And just to document this somewhere: If you need to parse out byte strings from (e.g.) XADD commands, this is how you can restore the proper encoding. I'm not 100% certain there aren't failure cases - If anyone can identify any (or has a systematic way of validating) let me know.

The key is !! codecs.escape_decode(x) !!

import codecs
from redis.client import Monitor as Monitor_

class Monitor(Monitor_):  # use: Monitor(r.connection_pool)

    monitor_re = re.compile(rb"\[(\d+) (.*?)\] (.*)")
    command_re = re.compile(rb'"(.*?(?<!\\))"')

    def next_command(self):
        """Parse the response from a monitor command"""
        # https://github.com/redis/redis/blob/4031a187321be26fc96c044fec3ee759841e8379/src/replication.c#L576
        response = self.connection.read_response(disable_decoding=True)
        command_time, command_data = response.split(b" ", 1)
        db_id, client_info, raw_command = self.monitor_re.match(command_data).groups()
        command_parts = self.command_re.findall(raw_command)
        command_parts = [codecs.escape_decode(x)[0] for x in command_parts]

        client_info = client_info.decode()
        if client_info == "lua":
            client_address = client_type = "lua"
            client_port = ""
        else:
            client_address, client_port = client_info.rsplit(":", 1)  # ipv6 has colons
            client_type = "unix" if client_info.startswith("unix") else "tcp"

        return {
            "time": float(command_time.decode()),
            "db": int(db_id.decode()),
            "client_address": client_address,
            "client_port": client_port,
            "client_type": client_type,
            # return a list of bytes with mixed encodings (utf-8 for everything unless you're storing things like jpegs)
            "args": command_parts,  
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant