Description
Hi, I'm filing this with the h2
repo because I have a PR which I'll send momentarily which applies to h2
, although I think this topic may pertain to h2
, hyper
, reqwest
, tonic
, warp
, etc equally.
While looking at reducing compile times of an application which has dependencies, both direct and indirect, on the hyperium stack, I noticed that h2
and hyper
show up in the output of cargo bloat
a lot. Here's an example of all such symbols > 5KiB for this particular closed-source application:
File .text Size Crate Name
0.0% 0.3% 39.7KiB h2 h2::proto::connection::Connection<T,P,B>::poll2
0.0% 0.2% 36.3KiB h2 h2::codec::framed_read::FramedRead<T>::decode_frame
0.0% 0.2% 36.3KiB h2 h2::codec::framed_read::FramedRead<T>::decode_frame
0.0% 0.2% 36.3KiB h2 h2::codec::framed_read::FramedRead<T>::decode_frame
0.0% 0.2% 36.3KiB h2 h2::codec::framed_read::FramedRead<T>::decode_frame
0.0% 0.2% 36.3KiB h2 h2::codec::framed_read::FramedRead<T>::decode_frame
0.0% 0.2% 36.3KiB h2 h2::codec::framed_read::FramedRead<T>::decode_frame
0.0% 0.2% 28.1KiB h2 h2::proto::connection::Connection<T,P,B>::poll
0.0% 0.2% 28.1KiB h2 h2::proto::connection::Connection<T,P,B>::poll
0.0% 0.2% 27.4KiB tonic h2::proto::connection::Connection<T,P,B>::poll
0.0% 0.2% 27.3KiB rusoto_credential h2::proto::connection::Connection<T,P,B>::poll
0.0% 0.2% 27.2KiB reqwest h2::proto::connection::Connection<T,P,B>::poll
0.0% 0.1% 18.9KiB http http::header::name::parse_hdr
0.0% 0.1% 18.9KiB tonic h2::proto::streams::prioritize::Prioritize::poll_complete
0.0% 0.1% 18.9KiB reqwest h2::proto::streams::prioritize::Prioritize::poll_complete
0.0% 0.1% 18.8KiB rusoto_credential h2::proto::streams::prioritize::Prioritize::poll_complete
0.0% 0.1% 18.8KiB <my-crate> h2::proto::streams::prioritize::Prioritize::poll_complete
0.0% 0.1% 18.1KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop
0.0% 0.1% 18.1KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop
0.0% 0.1% 17.9KiB hyper <hyper::proto::h1::role::Server as hyper::proto::h1::Http1Transaction>::encode
0.0% 0.1% 17.8KiB reqwest hyper::proto::h1::decode::Decoder::decode
0.0% 0.1% 17.7KiB <my-crate> hyper::proto::h1::decode::Decoder::decode
0.0% 0.1% 17.3KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop
0.0% 0.1% 17.2KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop
0.0% 0.1% 16.7KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
0.0% 0.1% 16.7KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
0.0% 0.1% 16.6KiB h2 h2::frame::headers::HeaderBlock::load::{{closure}}
0.0% 0.1% 15.9KiB h2 h2::proto::streams::prioritize::Prioritize::pop_frame
0.0% 0.1% 15.8KiB hyper hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
0.0% 0.1% 13.9KiB h2 <h2::server::Peer as h2::proto::peer::Peer>::convert_poll_message
0.0% 0.1% 13.0KiB rusoto_credential hyper::proto::h1::decode::Decoder::decode
0.0% 0.1% 12.7KiB hyper hyper::proto::h1::decode::Decoder::decode
0.0% 0.1% 12.7KiB hyper hyper::proto::h1::decode::Decoder::decode
0.0% 0.1% 12.6KiB tonic hyper::proto::h1::decode::Decoder::decode
0.0% 0.1% 11.8KiB <my-crate> h2::proto::connection::Connection<T,P,B>::poll
0.0% 0.1% 11.3KiB h2 h2::hpack::decoder::Decoder::decode
0.0% 0.1% 11.2KiB tonic h2::codec::framed_write::FramedWrite<T,B>::buffer
0.0% 0.1% 11.2KiB rusoto_credential h2::codec::framed_write::FramedWrite<T,B>::buffer
0.0% 0.1% 11.2KiB reqwest h2::codec::framed_write::FramedWrite<T,B>::buffer
0.0% 0.1% 11.2KiB <my-crate> h2::codec::framed_write::FramedWrite<T,B>::buffer
0.0% 0.1% 10.4KiB rusoto_credential <hyper::client::pool::Checkout<T> as core::future::future::Future>::poll
0.0% 0.1% 10.4KiB reqwest <hyper::client::pool::Checkout<T> as core::future::future::Future>::poll
0.0% 0.1% 10.4KiB <my-crate> hyper::client::pool::Checkout<T> as core::future::future::Future>::poll
0.0% 0.1% 10.3KiB h2 h2::codec::framed_write::FramedWrite<T,B>::buffer
0.0% 0.1% 10.3KiB h2 h2::codec::framed_write::FramedWrite<T,B>::buffer
0.0% 0.1% 10.2KiB hyper? <hyper::proto::h2::server::Server<T,S,B,E> as core::future::future::Future>::poll
0.0% 0.1% 9.8KiB hyper <hyper::proto::h1::role::Server as hyper::proto::h1::Http1Transaction>::parse
0.0% 0.1% 9.8KiB reqwest h2::proto::streams::streams::Streams<B,P>::recv_headers
0.0% 0.1% 9.6KiB hyper <hyper::proto::h1::role::Client as hyper::proto::h1::Http1Transaction>::parse
0.0% 0.1% 9.6KiB h2 h2::proto::streams::recv::Recv::recv_data
0.0% 0.1% 9.4KiB reqwest <hyper::client::conn::Connection<T,B> as core::future::future::Future>::poll
0.0% 0.1% 8.6KiB hyper hyper::client::pool::PoolInner<T>::put
0.0% 0.1% 8.6KiB hyper hyper::client::pool::PoolInner<T>::put
0.0% 0.1% 8.6KiB hyper hyper::client::pool::PoolInner<T>::put
0.0% 0.1% 8.4KiB <my-crate> hyper::proto::h2::client::ClientTask<B> as core::future::future::Future>::poll
0.0% 0.1% 8.3KiB hyper hyper::proto::h2::ping::Ponger::poll
0.0% 0.1% 8.2KiB rusoto_credential <hyper::proto::h2::client::ClientTask<B> as core::future::future::Future>::poll
0.0% 0.1% 8.0KiB tonic <hyper::proto::h2::client::ClientTask<B> as core::future::future::Future>::poll
0.0% 0.0% 7.4KiB hyper? <hyper::proto::h2::server::H2Stream<F,B> as core::future::future::Future>::poll
0.0% 0.0% 7.2KiB hyper <hyper::proto::h1::role::Client as hyper::proto::h1::Http1Transaction>::encode
0.0% 0.0% 7.1KiB h2 h2::hpack::encoder::Encoder::encode
0.0% 0.0% 6.8KiB h2 h2::proto::streams::prioritize::Prioritize::try_assign_capacity
0.0% 0.0% 6.7KiB hyper? <hyper::proto::h2::server::H2Stream<F,B> as core::future::future::Future>::poll
0.0% 0.0% 6.5KiB h2 h2::proto::streams::recv::Recv::recv_headers
0.0% 0.0% 6.1KiB hyper hyper::proto::h1::conn::Conn<I,B,T>::poll_read_head
0.0% 0.0% 6.1KiB hyper hyper::proto::h1::conn::Conn<I,B,T>::poll_read_head
0.0% 0.0% 6.0KiB <my-crate> http::header::map::HeaderMap<T> as core::iter::traits::collect::FromIterator<(http::header::name::HeaderName,T)>>::from_iter
0.0% 0.0% 5.8KiB tonic hyper::proto::h1::conn::Conn<I,B,T>::poll_read_head
0.0% 0.0% 5.8KiB rusoto_credential hyper::proto::h1::conn::Conn<I,B,T>::poll_read_head
0.0% 0.0% 5.8KiB reqwest hyper::proto::h1::conn::Conn<I,B,T>::poll_read_head
0.0% 0.0% 5.8KiB <my-crate> hyper::proto::h1::conn::Conn<I,B,T>::poll_read_head
0.0% 0.0% 5.7KiB hyper hyper::proto::h2::strip_connection_headers
0.0% 0.0% 5.6KiB h2 h2::hpack::table::Table::index
0.0% 0.0% 5.5KiB reqwest h2::proto::streams::streams::Streams<B,P>::recv_push_promise
0.0% 0.0% 5.5KiB h2? <h2::server::Handshake<T,B> as core::future::future::Future>::poll
0.0% 0.0% 5.5KiB hyper? <hyper::proto::h2::server::H2Stream<F,B> as core::future::future::Future>::poll
0.0% 0.0% 5.5KiB <my-crate> h2::codec::framed_write::FramedWrite<T,B>::flush
0.0% 0.0% 5.4KiB h2? <h2::server::Handshake<T,B> as core::future::future::Future>::poll
0.0% 0.0% 5.2KiB tonic h2::codec::framed_write::FramedWrite<T,B>::flush
0.0% 0.0% 5.2KiB reqwest h2::codec::framed_write::FramedWrite<T,B>::flush
0.0% 0.0% 5.1KiB hyper <hyper::server::tcp::AddrIncoming as hyper::server::accept::Accept>::poll_accept
0.0% 0.0% 5.1KiB tonic h2::proto::streams::prioritize::Prioritize::send_data
0.0% 0.0% 5.1KiB rusoto_credential h2::proto::streams::prioritize::Prioritize::send_data
0.0% 0.0% 5.1KiB reqwest h2::proto::streams::prioritize::Prioritize::send_data
0.0% 0.0% 5.1KiB <my-crate> h2::proto::streams::prioritize::Prioritize::send_data
0.0% 0.0% 5.1KiB h2 h2::proto::streams::prioritize::Prioritize::send_data
0.0% 0.0% 5.1KiB h2 h2::codec::framed_write::FramedWrite<T,B>::flush
0.0% 0.0% 5.1KiB tonic h2::proto::streams::streams::Streams<B,P>::recv_headers
0.0% 0.0% 5.1KiB rusoto_credential h2::proto::streams::streams::Streams<B,P>::recv_headers
0.0% 0.0% 5.0KiB h2? <h2::codec::Codec<T,B> as futures_core::stream::Stream>::poll_next
0.0% 0.0% 5.0KiB h2? <h2::codec::Codec<T,B> as futures_core::stream::Stream>::poll_next
0.0% 0.0% 4.9KiB tonic <h2::codec::framed_read::FramedRead<T> as futures_core::stream::Stream>::poll_next
0.0% 0.0% 4.9KiB rusoto_credential <h2::codec::framed_read::FramedRead<T> as futures_core::stream::Stream>::poll_next
As you can see, all of warp
, tonic
, reqwest
, and rusoto
are in the mix here. 12 of the top 20 largest symbols in the application are in this list (e.g. belong to h2).
Many of the symbols are repeated many times, presumably due to generic arguments & monomorphization. As a proof of concept I'll be sending a PR shortly which de-genericizes the largest symbol (FramedRead<T>::decode_frame
); hopefully this can be done across more areas of the stack/codebase.
Another mitigation which may help is figuring out how to make sure the generic parameters are the same across tonic/hyper/reqwest/etc, so that fewer monomorphizations occur.
My goal in filing this issue is to perhaps raise awareness of the issue, and brainstorm on potential solutions. Happy to add more information or gather more diagnostics on my particular case if it's helpful.
Activity
danburkert commentedon Sep 8, 2020
The promised PR which removes duplication among the top offender (
FramedRead<T>::decode_frame
): #484danburkert commentedon Apr 28, 2021
Checking back in here. The story has gotten better since September, with both #509 and #503 having landed. I'm still seeing significant bloat in my example application, here are some new
cargo-bloat
output withh2
0.3.2:danburkert commentedon Apr 28, 2021
@Marwes I'm curious, as part of #503 did you try adding
#[inline(never)]
to some of the outlined functions? I would have expected that PR to have reduced the # of instances of some of those symbols, perhaps the compiler is undoing the manual outlining?seanmonstar commentedon Apr 28, 2021
Is that something that
opt-level s
orz
can determine, so that if someone is optimizing for speed and the compiler feels like inlining some functions is fastest, we don't disrupt that?Marwes commentedon Apr 29, 2021
Given the size of the functions I doubt they are getting inlined and duplicated that way. Have you checked that you only have one version of h2/hyper/etc? I think I got it down to just two instantiations.
danburkert commentedon May 3, 2021
Here's a bit more info, on the same application, but a different commit, so the bloat output shouldn't necessarily be compared to above. I have verified through
cargo tree -d
that none of the hyperium-adjacent dependencies have duplicate versions. This application is a server, listening on 3 ports. 2 ports are driven usingwarp
and the 3rd usestonic
. Additionally, the application uses many more hyperium-based clients via eithertonic
orreqwest
directly, or indirectly through deps such asopentelemetry-jaeger
andsentry
.Selection of bloat output:
Interesting question, here's some data specific to this. I'm going to focus on the
h2::proto::connection::Connection<T,P,B>::poll
symbol as the largest duplicate, and which should have been un-duped by #503 (if I understand the PR correctly).After adding rustflag
opt-level = "z"
:danburkert commentedon May 3, 2021
I suspect the bloat may be caused by the
T, P, B
params being different among the different server/client instances. I haven't been able to figure out how to getcargo-bloat
to reify these in the output.