-
Notifications
You must be signed in to change notification settings - Fork 1
use TLS to communicate with Consul Connect services #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use TLS to communicate with Consul Connect services #1
Conversation
type ConnectConfig struct { | ||
ConnectAware bool `description:"Enable Consul Connect support." json:"connectAware,omitEmpty" toml:"connectAware,omitempty" yaml:"connectAware,omitEmpty"` | ||
// NOTE: Unless the certificates are issued by a trusted authority, this needs to be set as true | ||
InsecureSkipVerify bool `description:"Skip verifying certificates for communicating with Consul Connect services." json:"insecureSkipVerify,omitempty" toml:"insecureSkipVerify,omitempty" yaml:"insecureSkipVerify,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to teach traefik to trust the Connect CA?
Hi @RavensburgOP
Not sure if I understand this. Do you mean you want to eliminate the distinction between the service and its proxy? If so then it won't work because the services are not exposed over the network when they run with a connect sidecar. Every service that participates in connect mesh gets its own isolated network, and the only way to get inside the network is to go via the sidecar. Only the proxy is able to perform TLS handshake.
I think it is better left for the cluster operator to decide. Consul can implement this fallback logic using the L7 traffic management features. Implementing this in traefik will not only make it difficult for people to take full advantage of the L7 features in consul, it will also make the code more involved with all the things that consul is already doing for us.
it already works this way. The watcher loop relies on consul blocking queries and the state is refreshed if and only if there is an update available in catalog. See https://github.com/Gufran/traefik/pull/1/files#diff-c3e25341e36d91fba700c36777fee7b575bb7004a9521a76ec17d2fb3f748364R461-R473
I don't have any reservation or preference, I only wanted to make it work with Connect, so everything else you can do on top of that is an added bonus. Although you might want to think a little bit from the perspective of a person who is responsible for managing consul and consul connect, sometimes all we want from a proxy is just the proxy and nothing else. The more things you add to it the more likely it is that you'll begin to contend with Consul's L7 features. At a cursory glance the code looks fine to me, I'll be able to test it in a few hours. |
Ok, I figured out my current issues. the
After that we still have to figure out how to actually verfiy the certs properly :/ |
Hi @Gufran Thanks for taking a look at my PR and for such a thorough reply! 😄
I think I might be a bit confused, because as far as I can understand in the code, the L7 traffic management in Consul is circumvented, since the "consulService.ServiceAddress" returns the IP of the service. There'll be both a service for the connect-proxy and one for the service itself. I've setup a test environment with a service (
When I then call I might be missing something here, so if you can see where I'm going wrong, I'd be very happy to get an explanation 😄
I think my inexperience in Go probably is showing here (my experience boils down to a few hours last Friday learning it and the time I spent figuring out, what was going wrong) What tripped me up was this ticker := time.NewTicker(time.Duration(p.RefreshInterval))
for {
select {
case <-ctx.Done():
ticker.Stop()
return
case rootCerts = <-rootChan:
case leafCerts = <-leafChan:
case <-ticker.C:
p.certChan <- &connectCert{
service: p.Connect.ServiceName,
root: rootCerts,
leaf: leafCerts,
}
}
} Since I understood it as a pull mechanism, where the watcher functions were called on each tick. Thanks for helping me understand Go a bit better 😄 |
@apollo13 Good catch! I'll add a small todo and fix it later tonight or tomorrow 😄 |
No,
I guess it depends on how you look at it; in your example you should probably rename the sidecar service from
Imo a strong "no", if connect is configured then you want it to be used. If done right traefik cannot (physically) connect to the non-connect service anyways |
I fetched a consul leaf certificate:
We somehow need to tell Traefik/go to not validate the hostname; I wonder of a low-cost solution would be a EDIT:// what does |
I'm a bit confused about this statement, since that's not what I'm seeing here. I'm running a mock environment through docker-compose with 2 services (api and web), they each have a sidecar proxy (api-sidecar-proxy and web-sidecar-proxy). When traefik calls Consul to ask for the list of services, both the service and the service proxy is returned With a default rule for consul services in traefik, In the traefik's logs and the dashboard, it also shows both the service itself and its sidecar It could be that this is because I'm using docker-compose to spin up the services, since docker-compose doesn't have a concept of sidecars and everything simple shares a network bridge. I'd like to see if this also happens if I set things up in Nomad.
I think I've phrased my previous posts poorly. I'm not saying "I want" to expose both services to traefik, but in my setup they are by default, which made me assume that this is the expected behaviour with sidecar proxies. It now seems to me that I'm probably mistaken and something is not as it should be 😄 |
If set to I did a quick look through the crypto/tls source code and it doesn't seem like there is a way to disable only the hostname validation 😕 Edit: I might have been wrong https://github.com/golang/go/blob/928bda4f4a88efe2e53f3607e8d2ad0796b449c0/src/crypto/tls/example_test.go#L186-L228 |
Yes that is right, one has to provide a VerifyConnection callback to implement your own handling then, the test you linked shows an example of that. That said the biggest question is how to teach traefik to do that :)
the sidecar proxy running in the same containers or do you have 4 containers in total? But yes this will (even in nomad) register 4 services in consul (2 "real" and 2 "proxies")
That is also correct but I generally disable exposeByDefault and explicitly enable exposing on the services where I want them exposed. Note that this only concerns traefik; in consul you'd always see all services.
When deploying that to nomad you will also see the same number of services; but the non-proxy services will not be reachable if (emphasis on if) you configured them properly (ie the should only listen on 127.0.0.1) |
I think the best way to solve this is to ask the Consul maintainers to implement the callback on their end, then add a field in the
That makes sense. I think it will be important to give a good description of this in the traefik docs as best practices, since it might not be intuitively obvious, how to do this setup and preserve safety :) Thanks for the explanation, it helped me understand the underlying mechanisms much better! |
They both work on the tls config, not the tls connection, which I think is needed to verify the calls themselves (like here). They are also private functions (they start with lower case), so they would need to be made public for us to get access to them |
Ah sorry don't know enough about go wrt private functions. But working on the TLS config object should be fine; when the roundtrippper is generated we have a TLS config: https://github.com/traefik/traefik/blob/2747e240c1a97031367a1a566a1401a2367a54d2/pkg/server/service/roundtripper.go#L143-L148 -- at this point you'd also have access to the |
Mhm, I am leaning towards simply copying |
This is the code that upstream Go does for verification if |
Properly validate consul root CA.
@Gufran The PR is now working, it would be great if you could merge this so we can continue the discussion on the original traefik PR. |
Thank you guys, I really appreciate all the help. |
Hi Gufran :)
What does this PR do?
As mentioned in this comment on your PR to traefik, I've fixed the problems you were experiencing when trying to establish a TLS connection to a Consul Connect service
I decided to remove the "ConnectNative" configuration, since I couldn't find a way to use the Consul Connect package, without registering traefik as a Connect Native service and it simplified the configuration slightly to remove the option.
Slightly off-topic stuff
I've only made small changes to your original design, but I'd like to do some additional work on it such as:
I want to make sure, which of these ideas (if any), that you're onboard with, as it's your PR and you probably have some plans of your own.
I can make new PRs for these if you think any of the suggestions will be a good idea. Let me know what you think :)