-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Support (m)TLS API Socket #24601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support (m)TLS API Socket #24601
Conversation
Ephemeral COPR build failed. @containers/packit-build please check. |
I don’t think we ever looked into this. Largely because without certificate
authentication - which is hard to do - TLS doesn’t add much. Other people
not being any to read the traffic didn’t matter if we are exposing
passwordless root over the network.
…On Tue, Nov 19, 2024 at 10:14 Paul Holzinger ***@***.***> wrote:
@jwhonce <https://github.com/jwhonce> @baude <https://github.com/baude>
@mheon <https://github.com/mheon> FYI. Had you ever looked at TLS support
for the podman service and remote client?
With the amount of code here this looks easy enough to accept to me but I
was wondering if there is/was any reason why we do not support it.
—
Reply to this email directly, view it on GitHub
<#24601 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCFVIQJ3XD2SIG42VKL2BNIT5AVCNFSM6AAAAABSA2GWC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBVHE4TGNBQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just skimmed, but LGTM on approach. A lot less code than I was expecting for certificate auth. Now, if we have to start providing docs for how to do certificate auth right, I think the amount of changes grows substantially, but I don't think it's really worth it at this point. |
Sending the directory location of the private key out over the wire doesn't give me the warm fuzzies. Is it something that other software does? |
That certainly wasn't my intent, but I'm not sure what you're referring to. Is that a consequence of adding it to the conf file that I'm unaware of? |
It is not being send anywhere, both the client and server read the files locally and then use the certificates to perform a normal TLS handshake via the go std lib AFAICT.
@jwhonce I am not sure what are you referring to? The question for me was if there was a specific reason why TLS support was not done before. If not then I think we should accept this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this will need some e2e or system tests where we spawn a server with TLS and the connect with the remove client.
"os" | ||
) | ||
|
||
func ReadCertBundle(path string) (*x509.CertPool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you mope this in a new separate package, maybe call it cert
and then name the function ReadBundle()
We should really move away from dumping everything into util packages as this causes a lot of unwanted side effects via other imports that really should not have to be imported on the remote client, i.e. #23818. Fixing that will be quite some work but in the meantime we should not add new things there that are needed by the remote client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if len(tlsCAFile) != 0 || len(tlsCertFile) != 0 || len(tlsKeyFile) != 0 { | ||
logrus.Debugf("using TLS cert=%s key=%s ca=%s", tlsCertFile, tlsKeyFile, tlsCAFile) | ||
transport.TLSClientConfig = &tls.Config{} | ||
connection.tls = true | ||
} | ||
if len(tlsCAFile) != 0 { | ||
pool, err := util.ReadCertBundle(tlsCAFile) | ||
if err != nil { | ||
return connection, fmt.Errorf("unable to read CA bundle: %w", err) | ||
} | ||
transport.TLSClientConfig.RootCAs = pool | ||
} | ||
if len(tlsCertFile) != 0 && len(tlsKeyFile) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition here do not seem to line up perfectly.
If only tlsCertFile
is set then the first condition matches and says tls is used but then we never add the the key on the last one. That case should return a hard error I would assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to have that check performed in the cmd/podman/...
packages, but on reflection it makes sense to also perform the same check here, as that wouldn't account for the env vars. Done.
cmd/podman/system/connection/add.go
Outdated
if cmd.Flags().Changed("tls-cert") { | ||
return errors.New("--tls-cert option not supported for ssh scheme") | ||
} | ||
if cmd.Flags().Changed("tls-key") { | ||
return errors.New("--tls-key option not supported for ssh scheme") | ||
} | ||
if cmd.Flags().Changed("tls-ca") { | ||
return errors.New("--tls-ca option not supported for ssh scheme") | ||
} | ||
return ssh.Create(entities, sshMode) | ||
case "unix": | ||
if cmd.Flags().Changed("identity") { | ||
return errors.New("--identity option not supported for unix scheme") | ||
} | ||
if cmd.Flags().Changed("tls-cert") { | ||
return errors.New("--tls-cert option not supported for unix scheme") | ||
} | ||
if cmd.Flags().Changed("tls-key") { | ||
return errors.New("--tls-key option not supported for unix scheme") | ||
} | ||
if cmd.Flags().Changed("tls-ca") { | ||
return errors.New("--tls-ca option not supported for unix scheme") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the same conditions. Move them outside the switch case and match if scheme != tcp
so you do not duplicate that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cceab7f
to
0748cfe
Compare
Of course. I was hoping to have my environment capable of passing tests on main before I went on vacation last week, but that didn't end up happening. After throwing up my hands and just making a fresh CentOS VM, I have that working now. I'd like to use any existing tests for SSH/TCP remotes as a basis for my tests, as well as ensure my tests have parity with them, but for the life of me, I cannot seem to find any such tests. Would you mind pointing me to where they are? |
@Luap99 sorry, just back to this. This line: https://github.com/containers/podman/pull/24601/files#diff-cf5cc76e70a369c5221f08f29ec0452710af1f4ecd423c162d5b669688416caeR525 seemed to indicate that you were looking for the directory where the private key resigned. Unless I misinterpreted, that doesn't feel right. |
Yes, the purpose of this line is to create a command line argument so that the user can provide the path to their TLS client private key, which is necessary to perform authentication. Unless I too have misinterpreted something else in the codebase, neither that path, nor especially not the data in that file, ever goes over the wire. This is the analog of the |
Apologies for the delay on this, a number of other items were pushed onto my priority stack, but I've been able to resume work on it. As mentioned, there weren't any substantial e2e tests for the TCP API, and the libpod remote test suite was entirely using the unix socket, so I have refactored it to allow running it for unix, plain TCP, TLS, and mTLS. This might seem excessive, but it was only by running this complete suite did I find additional issues I had to fix. Suggestions are welcome for how to achieve similar coverage with fewer tests. Speaking of said issues, it would seem that the container attach client has a subtle race condition in it, as well as a memory leak. By overwriting the http transport and capturing the net.Conn as a local variable, if the same Connection is used concurrently, there is a possibility that the first net.Conn will be overwritten on the first attach call's local variable, resulting in the two calls proxying each-other's streams, or just outright corrupting them. As well, because the original transport is never restored, if the same client is used to attach to multiple containers, the closure is never freed, meaning the dial function will call an ever-increasing stack of closures. Fixing this issue felt outside the scope of this PR, but I have added a note warning of it to whoever feels like addressing it in the future. I would suggest adding some sort of Finally, there were a number of other e2e tests that were failing in my fresh centos 9 environment, even on main, so I have simply ignored them for the time being. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments, I need more time to look at the test changes. But we definitely cannot just explode the matrix and run 4 times the amount.
Makefile
Outdated
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_unix_testing" | ||
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_tcp_testing" | ||
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_tls_testing" | ||
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_mtls_testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not acceptable to me, it is certainly great to force coverage. But we run this many times on each Pr. A 4x time increase is not acceptable.
What we can consider is some split testing, we run the test on fedora rawhide, 41, 40 and debian sid so technically would could wire this up in CI ro run each case on a different distro to not add any new overhead will still getting full coverage. The transport layer should certainly not care about the distro (except underlying kernel bugs of course) so I think that may be best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this was a "nuclear option" in order to get tests to run with the least amount of change to the tests themselves, and to make sure that I wasn't missing anything.
One thought I had in the interim was that ginkgo allows tagging tests and sets of tests, and selecting a subset of them on the command line. Using this, it would be possible to run a full remote tests via unix sockets, and then a subset of all tests against tcp, tls, and mtls. It would then also be possible to add a separate target to run all tests in all remotes, but wouldn't be run in CI, only on-demand in development environments.
The main thing I would need for this is guidance from the podman core team on which tests they believed were critical to be tested over every possible remote, and which are "good enough" to only be tested over unix. For example, the attach endpoints would definitely need to be tested on all different remotes, as that's how I discovered the additional fixes I had to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't heard back, so I've gone ahead with this strategy with a small selection of, admittedly arbitrary, tests to pare down to when testing TCP, TLS, and mTLS, based mostly on gut feeling of "things that might go wrong when using a different transport".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Luap99 WDYT of the current labeling tests solution vs running past-fedora tests using tls? The labels are a clever spot checking solution, but I think editing the cirrus matrix for split testing feels more maintainable?
cmd/podman/system/connection/list.go
Outdated
} else { | ||
rpt, err = rpt.Parse(report.OriginPodman, | ||
"{{range .}}{{.Name}}\t{{.URI}}\t{{.Identity}}\t{{.Default}}\t{{.ReadWrite}}\n{{end -}}") | ||
"{{range .}}{{.Name}}\t{{.URI}}\t{{.Identity}}\t{{.TLSCAFile}}\t{{.TLSCertFile}}\t{{.TLSKeyFile}}\t{{.Default}}\t{{.ReadWrite}}\n{{end -}}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is a breaking change. While nobody should relay on the order of the output and use --format if they use it in scripts we can never know.
I guess I could be convinced to add them as last keys after readwrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, and while I'm not sure if podman or the the umbrella containers project has strict guidelines on the topic, I've always considered "human readable" data to be excluded from breaking changes. I would no sooner worry about breaking scripts that scrape this output than scripts that break if a new log message was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this for a bit, if breaking changes are that much of a concern, I have hidden the new outputs behind a flag. My thought is that this flag can be removed and this output made the default at the next major version bump when breaking changes are permissible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is breaking some system tests ATM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still counts as a breaking change ATM. I think the best thing to do is just to drop them, or to move them to the end, after readwrite. I prefer to drop them, and just document them as format options, so that if a user wants to see them, can can use --format
to display them
|
||
Context("without running API service", func() { | ||
It("add ssh://", func() { | ||
cmd := []string{"system", "connection", "add", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these changes are done by your formatter? Can you revert them, they bloat the diff unnecessarily and make reviewing harder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all of the errant auto-formats have been reverted. Please let me know if I have missed any.
3e9683f
to
817a696
Compare
This feature has become a priority for us, so I'm going to try to push this over the finish line, working on top of this draft. @meln5674 If you have any concerns, please feel free to reach out and let me know 😊 |
The feature is basically done, and I'm happy to finish it myself, but I've been waiting for a second review since February. Its only marked as draft as it depends on the other PR in common to be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@meln5674 Apologies, thanks for all your work! Happy to give reviews to push this through 😊
Just one nit.
Other than that, I think we need to wire up and test the common changes through Podman's CI before either PR merges. Could you do a go mod replace pointing to your c/common branch, and vendor that in in a separate commit here, just so we can throw it at CI? Once that passes, we can drop the commit here before merging.
pkg/bindings/connection.go
Outdated
return NewConnectionWithIdentityOrTLS(ctx, uri, identity, "", "", "", machine) | ||
} | ||
|
||
func NewConnectionWithIdentityOrTLS(ctx context.Context, uri string, identity string, tlsCertFile, tlsKeyFile, tlsCAFile string, machine bool) (context.Context, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but could we put the TLS stuff into a TLS struct? The amount of args going into this function is starting to be long enough that it makes a maintenance hassle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Will do. I'll need to spend a bit of time resurrecting the dev VM I had for this today to make sure tests still all pass locally after rebasing. |
5cbec1b
to
ab88302
Compare
@ashley-cui Understood. I made the nit'ed changes and rebased this branch on them. Once the container-libs PR is merged, I will point back common back to upstream and squash and all changes to a single commit for final review and CI run. |
7eb37ec
to
0c0578e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c/container-libs PR merged, please re-vendor and squash.
@containers/podman-maintainers PTAL
It doesn't look like container-libs has been assigned a new semantic version. I'll go ahead and re-vendor on the commit hash for now, but should it be like that for the final merge? Should there be a separate PR that just bumps the dependency? It also appears that the CI is complaining about differences in tool versions, which I suspect are between HEAD on container-libs and podman. EDIT: Didn't realize there was a special Makefile target for vendoring, going to try that. For whatever reason that wasn't needed for vendoring the fork. |
0c0578e
to
01f2562
Compare
Not sure if this is a me problem, but github has been stuck on "Checking for the ability to merge automatically..." for the last hour and change, and hasn't detected the new commit I pushed. Unsure of what I can do about this. |
@meln5674 Strange, feel free to re-push or to close and re-open the PR, see if that kicks the CI. |
Force-push is usually the most reliable way |
* Added flags to point to TLS PEM files to use for exposing and connecting to an encrypted remote API socket with server and client authentication. * Added TLS fields for system connection ls templates. * Added special "tls" format for system connection ls to list TLS fields in human-readable table format. * Updated remote integration and system tests to allow specifying a "transport" to run the full suite against a unix, tcp, tls, or mtls system service. * Added system tests to verify basic operation of unix, tcp, tls, and mtls services, clients, and connections. Signed-off-by: Andrew Melnick <[email protected]>
f05cb0f
to
feb36e4
Compare
Force push did the trick. I think the three failures are flakes again, as they don't look related to remotes, could those be checked and restarted? |
Concur all look like flakes, restarted |
One more WSL flake, it looks like. |
re-started the flaked test. |
LGTM @containers/podman-maintainers PTAL and merge |
To add some context we update by commit during the dev cycle, only before we cut new podman/buildah releases we cut tags on the other modules and then use those. Otherwise we would end up with a ton of new tags during dev which doesn't seem that helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99, meln5674 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for your patience and work on this one, @meln5674 ! |
Add flags
--tls-cert
,--tls-key
,--tls-ca
/--tls-client-ca
to the commandspodman remote
,podman system service
, andpodman system connection add
to support serving the API socket using TLS and mTLS, as well as connecting to such a socket.This relies on containers/common#2249 and will fail CI until merged.
Fixes #24583
Does this PR introduce a user-facing change?