Skip to content

Make it possible to have a dynamic reverse proxy #1132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
piraz opened this issue Apr 21, 2022 · 13 comments · Fixed by #1180
Closed

Make it possible to have a dynamic reverse proxy #1132

piraz opened this issue Apr 21, 2022 · 13 comments · Fixed by #1180
Assignees
Labels
Proposal Proposals for futuristic feature requests

Comments

@piraz
Copy link

piraz commented Apr 21, 2022

Let me explain what I understand from the reverse proxy plugin and the reverse proxy implementation.

From the plugin: https://github.com/abhinavsingh/proxy.py/blob/develop/proxy/plugin/reverse_proxy.py#L16-L26. The code above is directing /get to httpbin.org/get either on http or https.

REVERSE_PROXY_LOCATION: str = r'/get$'
# Randomly choose either http or https upstream endpoint.
#
# This is just to demonstrate that both http and https upstream
# reverse proxy works.
REVERSE_PROXY_PASS = [
    b'http://httpbin.org/get',
    b'https://httpbin.org/get',
]

If I would decide to change the reverse proxy location so everything going through the proxy is matched like this:

REVERSE_PROXY_LOCATION: str = r'/(.*)$'

Now everything(i.e. /get, /post/, /abcddfd/fd/dfd/33/3423432, etc...) will be matched by the proxy but will be dispatched to a fixed string in the REVERSE_PROXY_PASS array.

This is clear in when the reverse proxy is handling the request: https://github.com/abhinavsingh/proxy.py/blob/develop/proxy/http/server/reverse.py#L65-L71.

    def handle_request(self, request: HttpParser) -> None:
        # TODO: Core must be capable of dispatching a context
        # with each invocation of handle request callback.
        #
        # Example, here we don't know which of our registered
        # route actually matched.
        #
        for route in self.reverse:
            pattern = re.compile(route)
            if pattern.match(text_(request.path)):
                self.choice = Url.from_bytes(
                    random.choice(self.reverse[route]),   #  <==== RIGHT HERE, this is a string and that's it
                )
                break
        assert self.choice and self.choice.hostname
        port = self.choice.port or \
            DEFAULT_HTTP_PORT \
            if self.choice.scheme == b'http' \
            else DEFAULT_HTTPS_PORT
        self.initialize_upstream(text_(self.choice.hostname), port)

My case is to create a reverse proxy that matches a regular expression and upstreams to a dynamic url, where matches from the solved pattern could be added to url.

If I would like to have a way to inject what was resolved from the pattern r'/(.*)$', or any other pattern, to the self.reverse[route] in order to have something like:

REVERSE_PROXY_LOCATION: str = r'/(.*)$'
REVERSE_PROXY_PASS = [
    b'http://localhost:8090/{1}',
]
# or
REVERSE_PROXY_PASS = [
    b'http://localhost:8090/{match}',
]
# Could be another pattern,  those are just ideas. 

I cannot see a solution for that right now.

The approach of adding patterns will add complexity to the code but it will remove complexity from the developer's plugin.

Another solution would be adding a call from the ReverseProxyBasePlugin to resolve the self.choice = Url.from_bytes... and give us the solved location with the pattern and self.reverse[route], we just override this method and things and return the magic.

Currently we have no way to get into the handle request inside the ReverseProxy, added by the --enable-reverse-proxy option. If we have an option to provide what ReverseProxy to be used, that could be another solution to this problem.

Please let me know if this issue can be solved by the current code and if I'm missing something.

Could you help me with this issue?

@abhinavsingh
Copy link
Owner

@piraz Thank you for bringing this to my attention.

Indeed, we need to make reverse proxy more powerful to make it useful in real-world scenarios. We started adding reverse proxy support when folks came asking for it. It was added as a starter template and was never well thought-out.

I see two solutions, similar to what you proposed:

  1. Add support for automatic matched element replacement in configured urls
    • This is more complex, time consuming, error prone
  2. Simply drop those 2 variables and make everything API oriented
    • Plugin gets a callback with necessary metadata to take decision about faith of the incoming request

IMHO, option 2 is less work for me and importantly more flexible.

@abhinavsingh abhinavsingh added the Proposal Proposals for futuristic feature requests label Apr 21, 2022
@piraz
Copy link
Author

piraz commented Apr 21, 2022

Thanks @abhinavsingh. I think for the time being, using the skeleton app would be an option to replace the core ReverseProxy.

Let me try to hack that and I get it you posted if it works.

@rpgmaster280
Copy link

rpgmaster280 commented May 26, 2022

What's the ETA on this functionality? I have a strong use case for it and it's a bummer that it's not implemented. =(

For command line options, it would be awesome if were something like this:

proxy.py --reverse-proxy '/route1/(.*):https://remote1.com/route1/{0}' '/route2/(.*):https://remote2.com/route2/{0}'

I know it doesn't cover POST request rewrites, but simple URI rewrites from the command line would be amazing. Nginx and apache offer something similar, but they are too heavy weight for my use case. Mitmproxy does not offer this capability, and that's unfortunate because the command line options are kind of simple.

@rpgmaster280
Copy link

rpgmaster280 commented May 26, 2022

Script I'm working on right now (assuming initially proposed changes) is as follows:

import proxy
from typing import List, Tuple
from proxy.http.server import ReverseProxyBasePlugin

class ReverseProxyPlugin(ReverseProxyBasePlugin):
    def routes(self) -> List[Tuple[str, List[bytes]]]:
        return [
                (r'/path1/(.*)', b"https://127.0.0.1:8443/{0}"),
                (r'/path2/(.*)', b"https://127.0.0.1:7443/{0}"),
                (r'/path3/(.*)', b"https://127.0.0.1:6443/{0}")
        ]

if __name__ == "__main__":
    with proxy.Proxy(
            input_args=[
                "--threaded",
                "--enable-reverse-proxy",
                "--hostname=0.0.0.0"
            ],
            enable_web_server=True,
            port=443,
            plugins=[
                ReverseProxyPlugin,
            ],
    ) as _:
        proxy.sleep_loop()

This is simple and self contained, and would be significantly less of a headache than nginx or apache since they use so many config files.

@abhinavsingh
Copy link
Owner

@rpgmaster280 Thanks for the proposal. This looks pretty close. I am also thinking that for a given regex, user might be interested in load-balancing between multiple endpoint. FWIW, I don't even want to anticipate or restrict developers from what they can do once a pattern is matched.

Here is how I am imagining routes type definition might look like:

List[Tuple[str, Union[List[bytes], Callable[[str, str], None]]]]

str parameter to the callable will be matched regex and actual path. I am thinking there might not be any need of return types. These reverse proxy callback functions could simply raise exception to abort/teardown connections.

@piraz asked for match groups and IMHO we must also make way for it. I'll give more thought into it over the weekend. Once we have a clear idea of what to deliver, delivery should happen quick.

Feel free to chime in with ideas/suggestions.

Best

@rpgmaster280
Copy link

rpgmaster280 commented May 26, 2022

I have some code that's pretty close. Haven't full vetted it yet but it's something like this (in reverse.py):

image

Line 68 was throwing exceptions for me, so I had to change it to get the proxy to work for me. Lines 71 to 74 are where the actual changes are. I haven't testing this in the case where no regex groups are defined.

@rpgmaster280
Copy link

rpgmaster280 commented May 26, 2022

Tested the code change above and it seems to work fine. Can't seem to get the tool to work with certificates and ssl though. I believe that to be a separate issue from this one. This solution does not include any sort of user error reporting. Did you want me to do a pull request with the changes?:

image

import proxy
from typing import List, Tuple
from proxy.http.server import ReverseProxyBasePlugin


class ReverseProxyPlugin(ReverseProxyBasePlugin):
    def routes(self) -> List[Tuple[str, List[bytes]]]:
        return [
                (r'/wx64/(.*)', b"http://127.0.0.1:8443/wx64/{0}"),
                (r'/wx86/(.*)', b"http://127.0.0.1:8443/wx86/{0}"),
                (r'/vnc64/(.*)', b"http://127.0.0.1:8443/vnc64/{0}")
        ]

if __name__ == "__main__":
    with proxy.Proxy(
            input_args=[
                "--threaded",
                "--enable-reverse-proxy",
                "--hostname=0.0.0.0",
            ],
            enable_web_server=True,
            port=8080,
            plugins=[
                ReverseProxyPlugin,
            ],
    ) as _:
        proxy.sleep_loop()

So I think I figured out the issue I was having with the connection to the forwarded server. Is it possible to specify the certificates the server uses for the forwarded connection? Been looking at the command line arguments and I don't see anything.

@abhinavsingh
Copy link
Owner

@rpgmaster280 If you want please go ahead and send a PR. I can then either merge your PR and work on top of it, or simply work on top of your PR itself. I will try to take a look at it over the weekend.

@rpgmaster280
Copy link

rpgmaster280 commented May 27, 2022

Note that wild card replacements are slow. It's better to be explicit when using regex pattern matching. This should be capture in the documentation. Please see below:

https://docs.bmc.com/docs/discovery/113/improving-the-performance-of-regular-expressions-788111995.html

@abhinavsingh
Copy link
Owner

Thank you for the PR. I'll get into it over weekend. Likely we can move discussion to PR then onwards for more specifics about regex and callback types

@sunnyjocker
Copy link

Script I'm working on right now (assuming initially proposed changes) is as follows:

import proxy
from typing import List, Tuple
from proxy.http.server import ReverseProxyBasePlugin

class ReverseProxyPlugin(ReverseProxyBasePlugin):
    def routes(self) -> List[Tuple[str, List[bytes]]]:
        return [
                (r'/path1/(.*)', b"https://127.0.0.1:8443/{0}"),
                (r'/path2/(.*)', b"https://127.0.0.1:7443/{0}"),
                (r'/path3/(.*)', b"https://127.0.0.1:6443/{0}")
        ]

if __name__ == "__main__":
    with proxy.Proxy(
            input_args=[
                "--threaded",
                "--enable-reverse-proxy",
                "--hostname=0.0.0.0"
            ],
            enable_web_server=True,
            port=443,
            plugins=[
                ReverseProxyPlugin,
            ],
    ) as _:
        proxy.sleep_loop()

This is simple and self contained, and would be significantly less of a headache than nginx or apache since they use so many config files.

tested this code, it seems the upstream server with port 8443,7443,6443 are not working for https reverse proxy, but 443 seems fine, and http reverse proxy can use any port. i'm not sure if this is a bug or i miss something.

@abhinavsingh
Copy link
Owner

@sunnyjocker I'll double check shortly and update back on what's going on and what to expect. Thanks for bringing this to my attention.

@sunnyjocker
Copy link

@sunnyjocker I'll double check shortly and update back on what's going on and what to expect. Thanks for bringing this to my attention.

thanks for the help, looking forward to it. for security concern, i want to deploy the reverse proxy at public net, and the real service at lan of my lab. i tried "https to http"(https request and route to http url) configuration, but it doesn't work, i guess the reverse proxy works just as bypassing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposals for futuristic feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants