Skip to content

Add interoperability with InputStream/OutputStream with WebClient #28362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

raphw
Copy link

@raphw raphw commented Apr 20, 2022

I wanted to suggest better interoperability of the Spring WebClient and InputStream and OutputStream APIs. While I understand the asynchronous nature of the client, it is also suggested as a replacement for the synchronousRestTemplate and still many uses of the client will be synchronous to a blocked Mono. In this context, it can make sense to use the Java IO API for streams which is compatible to many existing libraries for IO. In my case, this would also offer a simple starting point for migrating from RestTemplate and other HTTP APIs.

The idea would be to use the client as follows:

WebClient webClient = WebClient.builder().build();
String result;
try (InputStream input = webClient.post().uri("http://localhost:8080/hello/world")
    .body(BodyInserters.fromOutputStream(out -> out.write("Hello!".getBytes(StandardCharsets.UTF_8))))
    .retrieve()
    .bodyToMono(InputStream.class)
    .block()) {
  result = new String(input.readAllBytes());
}

This is still work in progress, I am not overly familiar with the Reactor API, but the idea was that the streams would be read or written asynchronously while consuming them. This might also become a much more relevant approach once project Loom makes blocking a thread a more feasible approach where this simple approach might become preferable in some cases.

Would you consider such an addition?

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Apr 20, 2022
@bclozel
Copy link
Member

bclozel commented Apr 21, 2022

I don't think we want to allow InputStream / OutputStream types here, as this mixes the error handling: imperative with exceptions being thrown and reactive with error signals.

I think it would be better to do the following:

OutputStream output = //...
Flux<DataBuffer> buffers = webClient.post().uri("http://localhost:8080/hello/world")
    .bodyValue("Hello!")
    .retrieve()
    .bodyToFlux(DataBuffer.class);

Flux<DataBuffer> writeResult = DataBufferUtils.write(buffers, output);
// Instead of blocking, you could reuse the publisher here to connect to other APIs or synchronize with other asynchronous work.
writeResult.blockLast();

@raphw
Copy link
Author

raphw commented Apr 21, 2022

From my perspective, where I am currently migrating a rather large project, I disagree with your conclusion. The javadoc for RestTemplate, which is widely used today, states:

Please, consider using the org.springframework.web.reactive.client.WebClient which has a more modern API and supports sync, async, and streaming scenarios.

As things are, RestTemplate offers access to the request's OutputStream via HttpOutputMessage and the response's InputStream via HttpInputMessage. If the use of WebClient should be considered as a way forward from RestTemplate, I consider it meaningful that one can opt into using the same primitives of the former API, what makes refactoring large code bases first feasible. As by your own documentation, WebClient is intended also in synchronous use cases where streams make sense as an abstraction, especially when considering Loom in future JVM versions. And even without Loom, I argue that synchronous APIs are much more common today and will stay the standard approach also for many years to come.

Similar to your suggestion, in my current migration attempt, I am using byte arrays as intermediaries for input and output streams (via ByteArrayInputStream and ByteArrayOutputStream), but the memory usage and through-put of the application has worsened significantly due to the intermediate buffers. I am afraid, that this "easy" solution will be picked up by many, to emulate the "old API" in the synchronous use cases, what will result in a practical reduction of performance of many web-client-migrated Spring applications as the path of least resistance is often chosen.

Also, InputStream and OutputStream are omnipresent in the Java library landscape and it is used by many third-party libraries, so many users will at some point have a need to create compatibilities with these streams, even if there was a will to use the reactive APIs. Even DataBuffers allow unwrapping to InputStream, so clearly you are aware of this.

Please consider also that the JVM's own HttpClient offers such APIs for the response via BodyHandlers.toInputStream() and similarly for the request via BodyPublishers.ofInputStream. In my opinion this is a clear recognition that there is a need for such interoperability. If you ever wanted to present a HttpClient facade for Spring's WebClient you would also need such codes.

@raphw raphw force-pushed the webfluxstream-interop branch 7 times, most recently from 1289be3 to 3068dc3 Compare April 21, 2022 11:40
@raphw
Copy link
Author

raphw commented Apr 29, 2022

@bclozel Are you considering this feature? We are currently considering to move things to RestTemplate instead but I wanted to see if it might be worth waiting to safe time on a future migration.

@bclozel
Copy link
Member

bclozel commented May 2, 2022

If the use of WebClient should be considered as a way forward from RestTemplate, I consider it meaningful that one can opt into using the same primitives of the former API, what makes refactoring large code bases first feasible.

I think you might be reading too much into RestTemplate's Javadoc. RestTemplate is not deprecated and we don't expect Spring developers to "migrate" their codebases to WebClient. RestTemplate is merely in maintenance mode, meaning we won't be adding new features to it related to modern use cases such as streaming and reactive streams (see #19448 for example). We've refined our docs in #24503 to ensure that the community doesn't get the wrong idea.

As stated in my previous comment, I'm personally not in favor of supporting this use case. I think that WebClient is also tailored for scatter/gather and streaming scenarios. Exposing InputStream and OutputStream here is likely to create tricky issues for developers when it comes to backpressure, error signals or using reactor operators (such as retry). Your current implementation is not only buffering without backpressure, but also violating the "lazy" principle of reactive streams by subscribing to the upstream publisher before the stream is even read.

And even without Loom, I argue that synchronous APIs are much more common today and will stay the standard approach also for many years to come.

Again, we never expected that the majority of the Spring community would switch to reactive entirely. WebClient supports advanced usage and features that RestTemplate never had in the first place. We've also made sure that reactive types are supported in MVC Controllers to let developers compose those types and not worry about blocking.

Similar to your suggestion, in my current migration attempt, I am using byte arrays as intermediaries for input and output streams (via ByteArrayInputStream and ByteArrayOutputStream), but the memory usage and through-put of the application has worsened significantly due to the intermediate buffers.

The code snippet in my previous comment doesn't allocate additional byte arrays nor buffers content. Allocation should be better than RestTemplate thanks to the pooled data buffers. Did you run into issues while trying this?

I'd really like to understand how the proposed DataBufferUtils approach doesn't work out for you - maybe we can improve something there? I'd say that in your case, if imperative APIs and InputStream/OutputStream types are important and composition of asynchronous calls is not interesting, then RestTemplate is still a very much valid choice. I wouldn't bother "migrating" to WebClient if it doesn't bring real benefit for your use case.

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label May 2, 2022
@raphw raphw force-pushed the webfluxstream-interop branch 4 times, most recently from e657423 to be23341 Compare May 2, 2022 18:58
@raphw
Copy link
Author

raphw commented May 2, 2022

I think you might be reading too much into RestTemplate's Javadoc.

Spring has established a communication where RestTemplate will not get much attention in the future, of course this makes us look into the proposed alternative, even if the rhetoric has become softer on the template's future. But even today, Spring's javadoc states that:

{@code org.springframework.web.reactive.client.WebClient} [...} has a more modern API and supports sync, async, and streaming scenarios.

I conclude that streams are the most established primitive of sync Java I/O and this scenario is purely supported. This is in contrast to Java's own async HTTP client or also the Apache async HTTP client where streams are supported.

My working hypothesis with WebClient was that I wanted to use the client in an application where both async (few) and sync (most) use cases are in use. My experience is that the WebClient is not feasible as I constantly need to integrate with existing code which contains APIs like the following:

interface DomainConsumer {
  void consume(InputStream in, Consumer<T> c, Class<T> type, DomainContext ctx);
}

I would not be able to work with DataBuffers here as you suggested as the API is set and I cannot rewrite the entire application from scratch. With your suggestion, I would write to a DataBuffer mono and block on it. (Note that I require an InputStream.) This triggered exceptions as the memory buffer is limited in the decoder. If I increase the buffer size, things work out but the applications memory usage shoots up. Of course, my suggested solution blocks threads, but for my application this is not an issue. The few places where threading met its limits, my plan was to actually refactor the code to make use of reactive API if Loom does not solve the issue before I get to rewrite these segments.

I do neither think this is only my problem. I would expect that many users of WebClient will be required to integrate against some form of stream API at some point. I would expect streams to be one of the most established Java library primitives which is why I would hope for their support.

Alternatively, I could use Spring's WebClient for the async and RestTemplate for sync use. This does however imply that I configure two underlying clients what triggers new problems. Our connection pools per route would not longer interact, refactorings to async become more complicated as it requires a client rewrite on top of everything, and for our junior dev, the technological stack is large enough as it is, I am happy over any complication that I can avoid.

Your current implementation is not only buffering without backpressure, but also violating the "lazy" principle of reactive streams by subscribing to the upstream publisher before the stream is even read.

I was aiming for a POC, to be honest, I am not an expert on reactive programming. But I now refactored the implementation to the best of my knowledge and added some tests to provide a better means of discussion. I understand that this is non-trivial, but this is why I would like to see native support for this decoding/encoding as I would expect many to end up in a situation where this is needed where anybody would implement an overly simplified implementation of this feature.

That said: maybe a better API would be something along the lines of:

Mono custom = webClient.post().uri("http://localhost:8080/hello/world")
.bodyValue("Hello!")
.retrieve()
.streamToMono(inputSteam -> ...);

This way, the error handling would be consistent but one could still use reactive primitives. I also added a suggestion for this solution as static factories in BodyExtractors.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels May 2, 2022
@raphw raphw force-pushed the webfluxstream-interop branch from be23341 to 317b7d2 Compare May 2, 2022 19:54
@bclozel
Copy link
Member

bclozel commented May 2, 2022

With your suggestion, I would write to a DataBuffer mono and block on it. (Note that I require an InputStream.)

DataBufferUtils.readInputStream can also help to turn InputStream into Flux<DataBuffer>, solving all the allocation and buffering issues you've shared.

Our connection pools per route would not longer interact, refactorings to async become more complicated as it requires a client rewrite on top of everything, and for our junior dev, the technological stack is large enough as it is, I am happy over any complication that I can avoid.

No matter how we consider the problem, there's an important mismatch between InputStream/OutputStream and reactive APIs. I don't think it would be wise to surface those in the WebClient API. Blocking should only happen, if necessary, "at the edges" once the concurrency and asynchronous aspects have been dealt with. If the API of your application doesn't allow that, or you've invested significant efforts into RestTemplate and see no immediate benefit from using WebClient, then RestTemplate is definitely the best choice here.

I'm going to discuss this issue with other team members but in the meantime, please don't invest more time in this PR - as it stands, we're not likely to address this.

@raphw
Copy link
Author

raphw commented May 2, 2022

DataBufferUtils.readInputStream can also help to turn InputStream into Flux<DataBuffer>, solving all the allocation and buffering issues you've shared.

I do not think that's a common issue but it is the other way round. Many APIs accept an InputStream for a response primitive and there is no tool that converts a Flux<DataBuffer> to an InputStream, which is what I am trying to address.

Blocking should only happen, if necessary, "at the edges" once the concurrency and asynchronous aspects have been dealt with.

In my case, blocking is necessary as our code base is too large and we integrate against libraries that only accept InputStreams. My guess is that others will face the same problem and solve it by mapping any request to a byte[] or String and to use this value to create an ByteArrayInputStream what was my first attempt, too. When removing the decoder, with the proposed BodyInserter and BodyExtractor, I think there's a viable alternative for those who are bound to stream APIs today.

I'm going to discuss this issue with other team members but in the meantime, please don't invest more time in this PR - as it stands, we're not likely to address this.

I am working on a POC for a largish project, so it is not a big problem. I have neither spent much time on this and one idea was to host the code as part of our project. As things stand, we need to go for a solution that a large team can work with and the compatibility issues might lead us down to abandon the WebClient altogether and go for rest template or another async HTTP client that supports streams. I would have hope an easier bridge could let us use the web client to open up for an easier migration and use reactive programming where it really benefits us. We have a whole landscape of different HTTP clients in use today and the idea was to unify this behind one facade, but if course the sync use case needs good support to justify a choice here.

I still hope you can consider this, and thanks for your opinion along the way!

@raphw
Copy link
Author

raphw commented May 5, 2022

Another observation: while RestTenplate was un-deprecated, I noticed that the Netty and Http conponents async request factories still are deprecated and point to using WebClient instead.

If RestTemplate should be preferred over WebClient, is there a plan to reestablish such integrations? If using the same backend client was an option, this would of course be an alternative where one could use both RestTemplate and WebClient in parallel, for example both based on Netty 5. One option would be to offer this for the built-in JDK client. With this option, I could base the entire application upon this single client and use the appropriate abstraction upon it. I added some code from my current prototype to exemplify this usage.

@raphw raphw force-pushed the webfluxstream-interop branch 6 times, most recently from e281c15 to 07f5dc3 Compare May 9, 2022 08:18
@raphw raphw force-pushed the webfluxstream-interop branch 2 times, most recently from 7c86eee to 5f3c2fa Compare May 9, 2022 13:26
@raphw
Copy link
Author

raphw commented May 9, 2022

My current prototype for one of my customer aims to allow for the use of WebClient and RestTemplate by adding a request factory for RestTemplate that uses WebClient. I also added this experiment for consideration. This would allow me to bootstrap the WebClient for use in the "new world" that needs reactive and simply use a rest template based upon it by delegating all RestTemplate use to the WebClient. This way I would also not to keep two client implementations in sync when it comes to route limits, header tokens, timeouts etc.

Would you consider this a feasible solution?

… a JDK client connector for RestTemplate.

This change simplifies the use of syncronous HTTP APIs in Spring by adding support for InputStream/OutputStream for WebClient. While WebClient advertises that it offers an easy API for synchronous and asyncronous APIs, it is currently not possible to integrate against libraries that only work with InputStream or OutputStream. Typically, this leads to intermediate manifestation of inputs and outputs as byte arrays what causes additional overhead. This change therefore suggests a codec where the incoming ByteBuffers are piped to an InputStream and body extractors and inserters that work with streams. While this still requires blocking a thread, it avoids the intermediate buffering. It also simplifies code as the intermediate buffering does not need to be implemented.

Additionally, this change suggests a ClientHttpRequestFactory for RestTemplate that uses the JDK client directly or any new connector via a WebClient-based request factory. As the RestTemplate is no longer advertised as to be deprecated, and since the JDK client offers APIs for synchronous use with streams, offering this support seems sensible. This also allows for using RestTemplate and WebClient based on the same client which reduces the need to, for example, setup proxy configurations for two different HTTP client implementations.
@raphw raphw force-pushed the webfluxstream-interop branch from 5f3c2fa to c1f7bdd Compare May 9, 2022 21:13
@rstoyanchev
Copy link
Contributor

rstoyanchev commented May 11, 2022

Thanks for discussion @raphw.

Generally, there's nothing wrong with a bridge between java.io (blocking) streams and Reactive Streams. We already have methods in DataBufferUtils for InputStream -> Publisher and Publisher -> OutputStream. We just didn't consider it yet necessary to go in the other direction.

Streaming in the RestTemplate is at a low level. You have to write to an OutputStream for the request or read from InputStream for the response, but nothing for encoding and decoding because fundamentally there is no abstraction for a stream of any object <T> like there is with a Reactive Streams Publisher<T>. This is where WebClient can and does support streams of higher level objects, and that is why generally we expect you would work at the level of objects rather than byte streams.

I'm wondering whether you have your own encoding/decoding layer, or otherwise what is the reason for going to byte streams? I understand you have existing libraries/applications to work with, but if you could, please elaborate on what they do. This is relevant because there might be a path through creating Encoder and Decoder, or using and/or enhancing existing ones. It's generally useful to understand the scenarios that motivate this.

Or perhaps there is no encoding/decoding involved at all, and you are just passing the bytes to/from another source. This is where our methods in DataBufferUtils come in. They allow you to provide an external resource to read from, in the form of an InputStream, or an external resource to write to, in the form of an OutputStream.

At the implementation level, I agree with Brian that the PR, in its present state, isn't the way to go. The most straightforward path for what you want to do is through methods to bridge OutputStream -> Publisher and Publisher -> InputStream. This is similar to the ones provided in the JDK HttpClient and could exist in DataBufferUtils if we decide to have them added.

That said please provide more context so we can better understand the need for this.

Last comment, since you compared to the JDK HttpClient, on the request side BodyPublishers#ofInputStream is essentially InputStream -> Publisher which is the same as what we offer currently in DatatBufferUtils. They also don't offer OutputStream -> Publisher. Can you also clarify your actual needs for the request and response sides?

@raphw
Copy link
Author

raphw commented May 11, 2022

There are two reasons:

In the application I am working with, the same type might be represented differently depending on the specific path and even return status, what is of course not ideal but how things are as some applications get their data from other applications which all use slightly different setups of Jackson. The code is much more readable if the response is translated in the response handler where path and response handler are bundled, this is why we combine a request and a ResponseHandler which currently accepts an InputStream and processes this in various ways (JSON/XML/Atom).

In other places, we are reading an InputStream with binary data and want to process data as it arrives. The data processor is currently based on an input stream and we would not want to rewrite it as it is working well and we do not have a need to do this non-blocking. We have a similar implementation for an OutputStream that writes these binary messages on the fly without buffereing. Both works well with the RestTemplate but I do not see a way to do the same in WebClient.

You are right about the JDK client not accepting OutputStream but InputStream for a request body but this can be easily bridged by implementing a Publisher, similar to my suggestion.

I understand your suggestion and that you would prefer this in a DataBufferUtils. For me, I really do not care how I would achieve this, as long as it does not require too much boilerplate. I can only imagine that it is a common need: even if things can be solved differently and more efficiently, sometimes an InputStream/OutputStream is the best way to plug existing code as this is widely used core Java API and a way to interface different libraries.


Did you also consider the argument about using RestTemplate and WebClient side-by-side? If you offered a ClientHttpRequestFactory that accepts a WebClient, one could introduce a WebClient in a project without rewriting all RestTemplate code. We would want to avoid using two different HTTP clients side by side in one app as we want to use a common setup for route limits etc. Currently, WebClient and RestTemplate do not support any common HTTP client. By using a ClientRequestFactory for WebClient one could define one setup for it and simply mirror it to the rest template.

@rstoyanchev
Copy link
Contributor

rstoyanchev commented May 17, 2022

DataBufferUtils is already the place for integration between Publisher<DataBuffer> and java.io streams. Anything further along the same lines belongs there as a foundation that is much more widely applicable to any reactive chain.

This could be further integrated into the WebClient but I don't think it fits. Its encoding/decoding layer is built on Reactive Streams contracts. A Decoder returning Mono<InputStream> is a mismatch for the deferred nature of Reactive Streams, or it could be Mono<Supplier<InputStream>>, but this is still a strange way to do it, essentially tunnelling through the Mono. On the Encoder side, where we expect an "input" publisher of objects to encode, it's not even an option to pass something that wants to write to an OutputStream. At best it could accept an InputStream (probably why JDK BodyPublishers has ofInputStream only). That would require WebClient to be configurable with an Executor for the blocking reads.. It is arguably best left as an external layer. The same reasoning applies to BodyInserter and BodyExtractor I'll skip that detail here.

It sounds like you want to replace the encoding/decoding layer with your own, but WebClient is again built on the Encoder and Decoder contracts, and it's best to work with implementations of those contracts. In that sense, I don't think this is a very common case and this is the first request we've seen on this.

For your first case with alternating between different codecs based on path, this feels like something that should be possible to achieve with built-in codecs (possibly a filter that switches the content-type, or exchangeToMono to choose the type to decode to). For the second case, with a binary format, if it is popular, we could consider an additional implementation, for others to benefit from.

As for RestTemplate over the WebClient, it's not a direction we would consider. Technical details aside, the RestTemplate is an ageing API that we don't wish to revive as a way forward for imperative use. This is an area that we are thinking about but there is not much to share.

For the time being, the best option is to rely on Encoder/Decoder, either built-in or additional implementations as much as feasible. We can consider a possible expansion of DataBufferUtils with more options for Publisher and java.io streams integration. I should also mention that we now have an interface client emerging for 6.0 that provides another potential direction. It supports flexible method signatures including Reactive and imperative usage style.

@raphw
Copy link
Author

raphw commented May 19, 2022

I don't think this is a very common case and this is the first request we've seen on this.

I think it is more common, but the natural solution is to read and write byte arrays what is normally straight-forward. Most developers take the most accessible "common" API that one can find when integrating different libraries and do not think more about it. byte[] is a JDK type and two libraries that do not know each other can usually communicate using this primitive. With big payloads, this is however often infeasible which leads to using InputStream and OutputStream, another JDK primitive that most libraries understand. If one can navigate to this primitive via DataBufferUtils, and if this is straight-forward to use, I'd be personally happy enough, now I know where to look. Even though I think most Spring users would naturally search for a way in BodyInserters or BodyExtractors.

My issue with the current API is that it uses Flux<DataBuffer> as its surface. Neither type is a JDK primitive and even most seasoned Spring users do not know a thing about either of these types. So I would have hoped that there was an easy escape hatch for those looking for one. I would neither expect many developers being able to develop their own encoders or those encoders to be maintainable in a big, shifting team, that's why I would have hope an easy break for cases where custom handling is needed for the cases that I described. "Weird" code is much more common in the enterprise than one would hope. And some of the hatred Spring gets is exactly that, that it covers the common case well but makes the uncommon case impossible. With WebClient, I argue that this impression is currently true, unfortunately.

As for RestTemplate over the WebClient, it's not a direction we would consider. Technical details aside, the RestTemplate is an ageing API that we don't wish to revive as a way forward for imperative use.

While rest template might be aging, it is trivial to use in the non-reactive case. To most users of Spring, reactive streams are alien, and will stay alien, especially with Loom on the horizon. And this is where I feel like Spring is giving out mixed messages. Should I argue that the WebClient API does not support the non-reactive, blocking use well enough and ask for an improvement? Or is web client for reactive, asynchronous programming, and I should still expect that RestTemplate is the choice I should take? Just weeks ago, rest template was advertised as deprecated, and while this disclaimer was removed I am under the impression that the supporting infrastructure, such as the request factories, are already under removal.

I would therefore hope that rest template would be better supported by adding more request factories, for example for the built-in JDK client, or just for web client what would allow inheriting the latter. Or that Spring would improve the support of web client for the blocking, non-reactive use, which is likely the use of the vast majority of Spring users.

@rstoyanchev rstoyanchev added the in: web Issues in web modules (web, webmvc, webflux, websocket) label Jan 24, 2023
@poutsma
Copy link
Contributor

poutsma commented Sep 7, 2023

Closing this PR in favor of #31184, which attempts to solve the same problem in a more generic way.

@poutsma poutsma closed this Sep 7, 2023
@snicoll snicoll removed the status: waiting-for-triage An issue we've not yet triaged or decided on label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) status: feedback-provided Feedback has been provided
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants