Skip to content
This repository was archived by the owner on Apr 30, 2025. It is now read-only.

feat: log TLS handshake failures #404

Closed

Conversation

maxmoehl
Copy link
Member

  • A short explanation of the proposed change:

Currently any failure during a TLS handshake is emitted using the log package directly from the net/http library. These logs are not in the same format as the other gorouter logs which makes them hard to parse and they also lack a lot of the metadata we'd like to log to troubleshoot such issues.

This commit replaces the crypto/tls.Listener with a custom version. This custom version does the TLS handshake as part of the Accept which allows us to log all details from the tls.Conn and its tls.ConnectionState.

  • An explanation of the use cases your change solves

See above.

  • Instructions to functionally test the behavior change using operator interfaces (BOSH manifest, logs, curl, and metrics)

  • Expected result after the change

  • Current result before the change

  • Links to any other associated PRs

  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests.

  • (Optional) I have run Routing Acceptance Tests and Routing Smoke Tests

  • (Optional) I have run CF Acceptance Tests

@maxmoehl
Copy link
Member Author

maxmoehl commented Mar 26, 2024

We can't merge this as-is since it will be doing the TLS handshake in the main accept loop. A slow TLS handshake could therefore block gorouter from accepting other connections resulting in a DoS attack vector.

One option we could consider is performing the handshake in a dedicated go routine to not block the http.Server routine.

Currently any failure during a TLS handshake is emitted using the log
package directly from the net/http library. These logs are not in the
same format as the other gorouter logs which makes them hard to parse
and they also lack a lot of the metadata we'd like to log to
troubleshoot such issues.

This commit replaces the crypto/tls.Listener with a custom version. This
custom version does the TLS handshake as part of the Accept which allows
us to log all details from the tls.Conn and its tls.ConnectionState.
@maxmoehl maxmoehl force-pushed the maxmoehl/tls-err-log branch from aa46898 to 9082c6c Compare March 26, 2024 09:11
@maxmoehl
Copy link
Member Author

maxmoehl commented Mar 27, 2024

I've been prototyping locally to avoid the extra goroutine that would be spawned for each TLS handshake. While it shouldn't do any harm in theory (TLS handshakes are relatively expensive anyways and goroutines are lightweight) it does increase the resource consumption on server side.

One idea was to wrap the tls.Conn at some place to let the handshake occur naturally but capture the result using the wrapper. Unfortunately that is not possible because net/http has a very intimate relationship with crypto/tls and explicitly relies on the concrete types matching.

I guess the proper way would be to have the stdlib expose the data somehow. After reading through some of the related open issues I think that golang/go#38270 is probably the right way to go.

I'll open up this PR for review as I think the overall tradeoff is reasonable (maybe put the change behind a feature flag?), looking forward to other opinions!

PS: If we decide that the overall change is something we want I will add some tests.

@maxmoehl maxmoehl marked this pull request as ready for review March 27, 2024 07:20
@maxmoehl maxmoehl requested a review from a team as a code owner March 27, 2024 07:20
go func(c *tls.Conn, log logger.Logger) {
// TODO: Suppress the log from http.Server, see: golang/go#56183
// TODO: Are deadlines already set on this connection? Probably not.
err = c.Handshake()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the handshake asynchronous to returning the tls connection? What are the consequenses of returning a connection that later has a failed TLS handshake? It looks like go is trying to make concurrent Handshake() calls thread-safe, but if one fails, it closes the connection. Will this result in panics when trying to read later on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your comments on the main issue now explaining the issue of inline handshakes.

Copy link
Member Author

@maxmoehl maxmoehl Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Read will first call Handshake again which will return the previously encountered error stored in Conn.handshakeErr.

So besides wasting CPU cycles and some RAM this should be good.

@geofffranks
Copy link
Contributor

I think having additional tls handshake failure logging could be helpful, but I'm pretty nervious about replacing core golang functionality with custom implementations. This should definitely have an opt-in feature flag.

@maxmoehl
Copy link
Member Author

maxmoehl commented Apr 4, 2024

I won't be able to continue on this for now but we intend to pick this up in the near future. To-dos:

  • Put this behind a feature flag.
  • Add a test to ensure logs are emitted for handshake failures.

I'll put this PR to draft for now.

@maxmoehl maxmoehl marked this pull request as draft April 4, 2024 13:17
@plowin
Copy link
Contributor

plowin commented Apr 19, 2024

As we currently do not have concrete reasons to apply such a workaround that brings some risk, we are closing the PR and are monitoring the progress on a proper solution via golang (golang/go#38270). Thanks for the contribution and feedback!

@plowin plowin closed this Apr 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants