Skip to content

proposal: crypto/tls: dynamically reload root certificate authorities #64796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
p0lyn0mial opened this issue Dec 19, 2023 · 18 comments
Open

proposal: crypto/tls: dynamically reload root certificate authorities #64796

p0lyn0mial opened this issue Dec 19, 2023 · 18 comments
Labels
Proposal Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@p0lyn0mial
Copy link

p0lyn0mial commented Dec 19, 2023

Proposal Details

The current state of the tls.Config.RootCAs presents a challenge when it comes to dynamically reloading the Root Certificate Authorities. The existing approaches involve either disabling automatic certificate and host verification or limiting functionality to non-proxied requests. This proposal seeks to address this limitation by introducing a mechanism for dynamic reloading of Root CAs.

The current workaround for dynamically updating Root CAs involves using the VerifyPeerCertificate, but it requires disabling automatic certificate and host verification by setting the InsecureSkipVerify field to true. This approach might introduce some security risks if the custom implementation is not correct.

Alternatively, a custom TLS Dialer could be used. However, this approach falls short when dealing with proxied requests, limiting its applicability in scenarios where proxies are used.

To enhance the flexibility of TLS configurations, this issue proposes modifying the behaviour of tls.Config.RootCAs to support dynamic reloading by introducing a function similar to GetClientCertificate.

Note that in the past, a similar request was rejected. Since commenting on that issue is disabled, I decided to create a new issue to see if something has changed. Having such a mechanism in the standard library would help the Kubernetes community address kubernetes/kubernetes#119483.

Update:

It largely depends on how the tls.Config.RootCAs is used internally. Another solution would be to accept an interface instead of *x509.CertPool and allow clients to inject a thread-safe implementation, enabling trust reloading.

@gopherbot gopherbot added this to the Proposal milestone Dec 19, 2023
@seankhliao seankhliao changed the title proposal: dynamically reload Root Certificate Authorities in tls.Config proposal: crypto/tls: dynamically reload root certificate authorities Dec 19, 2023
@seankhliao seankhliao added the Proposal-Crypto Proposal related to crypto packages or other security issues label Dec 19, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Jan 2, 2024
@seankhliao
Copy link
Member

cc @golang/security

@espadolini
Copy link
Contributor

The discussion at #22836 mentions that it's possible to replicate the normal certificate verification behavior by using specifying VerifyConnection and setting InsecureSkipVerify, but from what I can tell there's no inherent way for VerifyConnection to have access to the ServerName in the tls.Config in use if it's an IP address (tls.ConnectionState.ServerName is set to the value sent by the client as SNI, but the spec prescribes that IP addresses should not be sent as SNI, and the Go TLS handshake follows that).

Obviously a VerifyConnection callback will have access to the ServerName as specified in the original tls.Config, but the general behavior with TLS clients in the ecosystem seems to be that the calling code is intended to pass a tls.Config with no ServerName set, and the callee will clone the config and set the server name right before initiating the handshake, which will obviously not work with a custom VerifyConnection that only has a reference to the original tls.Config.

@shaj13
Copy link

shaj13 commented Dec 21, 2024

maybe to introduce a new callback in tls.Config that allows for dynamically retrieving the latest CertPool, similar to the existing callback functions.

// GetRootCAs returns  the set of root certificate authorities
// that clients use when verifying server certificates.
// 
// If GetRootCAs is nil, then the RootCAs is used. 
GetRootCAs func() (*x509.CertPool)

The client handshake will work with the proposed changes as follows:

func (c *Config) rootCAs() *x509.CertPool{
	if c.GetRootCAs == nil {
		return c.RootCAs
	}
        
        return c.GetRootCAs()
}
func (c *Conn) verifyServerCertificate(certificates [][]byte) error {
	} else if !c.config.InsecureSkipVerify {
		opts := x509.VerifyOptions{
			Roots:         c.config.rootCAs(),
			CurrentTime:   c.config.time(),
			DNSName:       c.config.ServerName,
			Intermediates: x509.NewCertPool(),
		}

		for _, cert := range certs[1:] {
			opts.Intermediates.AddCert(cert)
		}
		chains, err := certs[0].Verify(opts)
		if err != nil {
			c.sendAlert(alertBadCertificate)
			return &CertificateVerificationError{UnverifiedCertificates: certs, Err: err}
		}

		c.verifiedChains, err = fipsAllowedChains(chains)
		if err != nil {
			c.sendAlert(alertBadCertificate)
			return &CertificateVerificationError{UnverifiedCertificates: certs, Err: err}
		}
	}
}

@ms-jcorley
Copy link

ms-jcorley commented May 2, 2025

A GetRootCAs function would greatly help in a fast-rotating custom PKI scenario.
The InsecureSkipVerify/VerifyConnection "work around" is very error prone and hard to be sure you got it right.

EDIT: reading the envoy proxy issue linked above, another (likely better) work around is this advancedtls package
https://pkg.go.dev/google.golang.org/grpc/security/advancedtls

@rolandshoemaker
Copy link
Member

rolandshoemaker commented May 7, 2025

It sounds like the concrete proposal here is to make the following change to the tls.Config type:

type Config struct {
	...

	// GetRootCAs, if not nil, is called when a client verifies a server
	// certificate in order to retrieve the set of root certificate authorities
	// to use when verifying said certificate. If set, the contents of RootCAs
	// will be ignored.
	//
	// If GetRootCAs returns an error, the handshake will be aborted and that
	// error will be returned. Otherwise GetRootCAs must return a non-nil
	// x509.CertPool.
	GetRootCAs() (*x509.CertPool, error)

	// RootCAs defines the set of root certificate authorities that clients use
	// when verifying server certificates. If GetRootCAs is set, RootCAs will be
	// ignored. If RootCAs is nil, TLS uses the host's root CA set.
	RootCAs *x509.CertPool

	// GetClientCAs, if not nil, is called when a server verifies a client
	// certificate in order to retrieve the set of root certificate authorities
	// to use when verifying said certificate. If set, the contents of ClientCAs
	// will be ignored.
	//
	// If GetClientCAs returns an error, the handshake will be aborted and that
	// error will be returned. Otherwise GetClientCAs must return a non-nil
	// x509.CertPool.
	GetClientCAs() (*x509.CertPool, error)

	// ClientCAs defines the set of root certificate authorities that servers
	// use if required to verify a client certificate by the policy in
	// ClientAuth. If GetClientCAs is set, ClientCAs will be ignored
	ClientCAs *x509.CertPool
}

This seems reasonable.

@sigmavirus24
Copy link

@rolandshoemaker is there a need to write up a proposal doc for this? If not, I'd be happy to implement this.

@gakesson
Copy link

gakesson commented May 7, 2025

@rolandshoemaker I like this proposal for root CAs and even if this ticket does not explicitly mention about client CAs, I guess the same applies for the ClientCAs field so for consistency reason the corresponding GetClientCAs would be beneficial.

@rolandshoemaker
Copy link
Member

@rolandshoemaker I like this proposal for root CAs and even if this ticket does not explicitly mention about client CAs, I guess the same applies for the ClientCAs field so for consistency reason the corresponding GetClientCAs would be beneficial.

Ah yes, I overlooked that. Updated my comment to reflect both sides.

@FiloSottile
Copy link
Contributor

On the server side there is already GetConfigForClient, which allows rotating the ClientCAs field.

I'm a bit uneasy about adding callbacks piecemeal for every single thing one might want to rotate in a client-side tls.Config. Is there a strong reason Certificates and RootCAs need callbacks, and not MinVersion and NextProtos and EncryptedClientHelloConfigList? At least the GetClientCertificate callback takes input from the handshake. I always found argument-less callbacks an indication of something that went wrong.

@rolandshoemaker
Copy link
Member

Of course I also forgot about GetConfigForClient.

I think for RootCAs though this still makes sense, we've heard from multiple people who need long running servers that have pools that change on a not particularly infrequent basis, and who explicitly don't want to have to restart the server in order to reload the pool. I think for a lot of other things (like MinVersion and NextProtos) which are unlikely to change particularly frequently, this wouldn't make sense.

@sigmavirus24
Copy link

My employer has ways of updating trust stores on disk and we're hoping to be able to remove roots as quickly as possible without requiring everything to always restart. If we update the file, we can implement something to reload the bundle and prune the removed root.

@gakesson
Copy link

gakesson commented May 7, 2025

@FiloSottile ah that's right, I forgot about GetConfigForClient.
However similarly as @sigmavirus24 we also have the need for reloading the root CA certificates in runtime, due to internal PKI with renewed/revoked CAs, re-bundling of trust etc. From my experience fields like TLS minimum version or ALPN is pretty much a deploytime setting whereas certificates are more dynamic in nature, coming from e.g. Kubernetes secrets, so for me it is reasonable they can be dynamically reloaded.

I would find it very useful to have a callback for root CAs for this reason, right now I have to recreate the client's http.Transport whenever CA certificate change is detected, so that I can provide a new Transport with new tls.Config and updated root CAs.

@aclements
Copy link
Member

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— aclements for the proposal review group

@aclements aclements moved this from Incoming to Active in Proposals May 8, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/670815 mentions this issue: crypto/tls: allow dynamic reloading of root certificates

@espadolini
Copy link
Contributor

@gakesson for http.Client/http.Transport specifically you can define your own DialTLSContext today, and just ignore the internal TLSClientConfig; it won't work for proxied requests however, so your dialer would have to also deal with that (but realistically you should consider swapping out the builtin http proxy support for something custom that supports SOCKS too, imho).

@Galabar001
Copy link

Something to add here is that the GetConfigForClient mechanism can be problematic. For example, the gRPC library will clone and modify the provided tls.Config struct:

https://github.com/grpc/grpc-go/blob/master/credentials/tls.go#L237

These changes won't be reflected in the client connection:

https://cs.opensource.google/go/go/+/master:src/crypto/tls/handshake_server.go;l=162?q=GetConfigForClient&ss=go%2Fgo&start=11

as that config will be whatever is returned by GetConfigForClient.

This new, proposed mechanism alleviates that issue (and is important for those of use who need to dynamically update client and server CA certificates).

I just wanted to point out that GetConfigForClient is not a full solution in some cases.

@sigmavirus24
Copy link

@Galabar001 since *ClientHelloInfo has a copy of the config before GetConfigForClient is called, I wonder if the simplest path to making GetConfigForClient work correctly would be a proposal to add (chi *ClientHelloInfo) func Config() *tls.Config { return c.config.Clone() } to allow GetConfigForClient to then modify that or whatever else may be needed.

@Galabar001
Copy link

Galabar001 commented May 9, 2025

@Galabar001 since *ClientHelloInfo has a copy of the config before GetConfigForClient is called, I wonder if the simplest path to making GetConfigForClient work correctly would be a proposal to add (chi *ClientHelloInfo) func Config() *tls.Config { return c.config.Clone() } to allow GetConfigForClient to then modify that or whatever else may be needed.

That's a solid idea.

What I think we'll usually see with "GetConfigForClient" is a cached tls.Config protected by a mutex. There will be a background thread that routinely loads a new tls.Config and swaps out the original. GetConfigForClient simply locks the mutex and returns either the current tls.Config, or a copy of it (for safety).

Instead, what could be cached would be the tls.Certificate and CertPool. The initial tls.Config that you create would be fully filled out. Then, you'd just call your "ClientHelloInfo.Config" function above, copy that config, replace the Certificates and RootCAs/ClientCAs values, and return that copy.

Yeah, that would be much better than the current situation.

However, if the RootCAs and ClientCAs could be updated dynamically, we could avoid GetConfigForClient entirely.

At the moment, though, I think we are not in a great position -- we don't have a mechanism that works for 100% of the current (Google library) use cases without some amount of hacking (e.g. filling in the NextProtos for gRPC configs before calling the gRPC library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Active
Development

No branches or pull requests