Skip to content

Allow to manipulate request body, headers and method before a request is proxied #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Elettrotecnica
Copy link

In this PR I implement some extra flag for the upstream api that allow to send a request to a backend where the request body and headers are not those from the current connection, but completely arbitrary.

This allows to create wrappers to web apis that can return a streaming response, such as https://github.com/ollama/ollama/blob/main/docs/api.md

In my wrapper I can then differentiate between:

  • "normal" calls -> build a request, send it to the endpoint via e.g. ns_http, return the reply
  • streaming calls -> build the request, send it to the backend, let the backend serve the reply through us as proxy

The previous behavior should be unchanged by this new feature.

Many thanks for any feedback on this.

@gustafn
Copy link
Contributor

gustafn commented Nov 29, 2024

I have trouble to understand the terminology: what is a "normal" and a "not normal" request? By their nature, the ns_connchan requests are streaming requests, as they are not fully receiving a request but following an i/o driven model. The ns_http based approach (as we have it now) has to receive the full request.

From the description, I do not understand what you are trying to achieve. I got a rough understanding of the intentions by looking at the ollama "streaming", but the details are also missing there. Is the "streaming" referring here to HTML-streaming (i.e. long-running requests, getting piecewise information). This is already handled to a certain degree in revproxy, causing e.g. a change in delivery method from ns_http to ns_connchan.

What do you mean by "build the request"? You refer to the request header? Do you perform request splitting?

There are several changes in the PR that have nothing to do with the topic. Can you produce multiple PR independent PRs from this, or do i have to do some cherry-picking, before we get to the essentials?

@Elettrotecnica
Copy link
Author

Is the "streaming" referring here to HTML-streaming (i.e. long-running requests, getting piecewise information).

Yes, in the case of streaming requests, ollama will keep the request opened untill all chunks of the reply have been sent, each as separate JSON objects.

From the description, I do not understand what you are trying to achieve.

I want to open an HTTP request to the ollama backend with a "streaming" reply. I do not want to expose the backend "as-is" via regular proxy, but via a wrapper class that will take care of building the request (body, headers and so on). The request should afterwards be handled exactly as it it was a proxied request, e.g. the frontent and backend channels, callbacks, error handling and all of the things revproxy does.

Currently, reproxy will always assume it is the current request, headers and body that is being sent to the backend. My changes try to make so that one can craft the request to be proxied instead.

IMO the changes are on topic, in the sense that they only introduce intended feature in the api. I have moved some logics from the specific backend code to the generic api, because they were doing a lot of the same stuff.

Hope it is clearer what I am trying to achieve. One way to not need any of this would be that ns_http would support streaming responses, with e.g. a callback to be called whenenver data is there, similar to what the tcl http module does (see https://www.tcl.tk/man/tcl8.6/TclCmd/http.htm#M23)

@Elettrotecnica
Copy link
Author

One way to not need any of this would be that ns_http would support streaming responses, with e.g. a callback to be called whenenver data is there, similar to what the tcl http module does (see https://www.tcl.tk/man/tcl8.6/TclCmd/http.htm#M23)

In fact this is only partially true: even with a callback like this, I would still need to set up my own streaming delivery of the response on the frontend. Proxying the reply has the advantage that it is the backend doing the work.

@gustafn
Copy link
Contributor

gustafn commented Dec 16, 2024

Why have you dropped silently the binary handling?

@Elettrotecnica
Copy link
Author

I took it out of the specific implementations and put it in the revproxy-procs.tcl file. The intention was to factor the behavior out, as I am introducing that one can specify their own content.

@gustafn
Copy link
Contributor

gustafn commented Dec 18, 2024

I still don't understand of what you want to achieve in detail. HTTP streaming was already supported before your changes - and it is used daily, e.g. on openacs.org. My guess is that you were fighting with timeouts. I have made the HTML streaming implementation more robust against non-working configurations by changing the timeouts on the fly when HTTP streaming was detected.

I have tested the implementation also again ollama

Without reverse proxy:

% curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

Starting the reverse proxy with the configuration file from the revproxy README on github

nsd_revproxy_target=http://localhost:11434 /usr/local/ns/bin/nsd -f -u nsadmin -g nsadmin -t /usr/local/ns/conf/nsd-config-revproxy.tcl

Running the curl request via the reverse proxy:

% curl http://localhost:48000/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

Is there something else to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants