-
Notifications
You must be signed in to change notification settings - Fork 116
install to-disk with LUKS + TPM broken #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am able to unlock the LUKS volume via TPM on a live system (FCOS) booted on the same VM I used to perform the I think my proposal is if it remains possible to install to a block device and into a LUKS volume, a user supplied password ought to be an option (it appears this would be a straightforward modification). The LUKS password can be changed or removed later on once the system is installed. If a user opts in to foregoing a password by not specifying one, then you would get the current behavior with no means of recovery. This would necessitate dropping the |
Looks like this was observed with bootc 0.1.7 on an image created from the treefile configuration at this point that resulted in this container image |
Thanks for filing this, indeed we have a CI gap on this here. (The integration of |
There needs to be a failover way to unlock any LUKS volumes bound to tokens at install time. Right now bootc actually creates a temp password and then discards it. At a very basic level this could just be output to the console as a very crude method to provide a way in if the TPM method isn't working. This is a basic issue with the to-disk install path. There really should be some user configurable options though. Unless the thought is bootc will only handle the bottom half of the logic (installing to a pre-made filesystem) and to-disk install path gets removed??
This was tested with Fedora 39 which has plenty of ostree based releases so I don't think that is necessarily the issue. Are there specific additional integrations needed (link to docs please)? Based on my detailed and exhaustive review of the existing state of the art (IE: everything here), I'm really coming up short for why this specific configuration isn't working. Looking at how the LUKS configuration is passed to the initrd via kargs and my double checking that the necessary libs are present in the initrd, the failure to unlock really doesn't make sense. Are any modifications beyond adding a few dracut modules needed to support TPM2 based unlocking with systemd-cryptsetup in the initrd? This is about what I'd expect to need to do to preconfigure an initrd with the proper libs. I've done this before on Debian and prior versions of Fedora successfully. Hazard a guess at what else should be checked here? |
For some use cases, all data on disk is effectively a cache, and having a failover adds risk and management overhead - questions around how is the failover secret rotated, etc.
There is some strong inherent tension here but the overall thought on the design is that |
Valid concern for that specific case but that has got to be one hell of an edge case. Even if that is a main use case it creates a situation that is so unnecessarily difficult to debug that there needs to be an alternative to enable basic development. Such behavior that discards/skips/doesn't support a manual failover mechanism needs to be explicitly opted into by the user. Or the silly default must have an opt out. Right now it is forced opt in which is a problem.
I'd encourage you to review all the logic in the supporting libraries and systemd itself for this to work. It is anything but simple and it increases the dependency footprint substantially (proper libs need to end up in the initrd, for instance). Far more complex than a password. |
Tried to I tried the build from last week previously, and it successfully unlocked the LUKS root volume on reboot (sha256:1c5e91ab395665ca11e2c1a17df18beec39d63fd2948f15add9fd95e45c0c85b). Clearly there have been some package changes in the past few days that broke this?? While waiting for this update, I also tried both F39 and F40/next based builds with the latest bootc from the copr. Same story with those -- failed systemd-cryptsetup units trying to unlock via TPM. |
Thanks Jon, this is very valid feedback and thanks for looking at this.
For example in cloud environments I may want to enable LUKS to be very sure my data is encrypted, and binding to the virtualized TPM2 is a generic baseline for that that at least helps ensure that if e.g. someone gets access somehow to an underlying block store they can't read the data. I am sure some people want a fallback password even in cloud, but it's not very "cloud native" to log in interactively on a console in an IaaS.
Yes, fair enough. OK so I think what my inclination here is to make Then in parallel of course, we should:
|
A great use for the recovery key option. If bootc could cleanly output the recovery key a secondary process could store it in a secrets vault. You really cannot have a TPM2 only binding. Even systems like Windows will provide recovery keys for when the PCRs change (as they are designed to do). Some deployments will want to bind to a lot more than PCR 7 and some of those PCRs may change even on OS update.
There would also need to be a flag that could either have the system create and provide at install time a temp password or enroll a recovery key. Ideally both should be supported (recovery keys are lengthy and would be particularly obnoxious to deal with for repetitive early stage testing) at the user or image builders discretion. Specifically as to the initramfs components, the requirements need to be documented somewhere. Just having other works (like the centos-bootc) to reference isn't a great experience. It isn't entirely straightforward exactly what dracut modules should be added in. It's a bit different configuring the initramfs here because the image is being composed off board from the system it will run on, so any auto detection of things isn't going to work.
Personally I would perhaps consider just removing the to-disk workflow. I'm not sure what value it brings if it can only handle trivially simple deployments. Is the juice from the added complexity actually worth it? I think it might be hard to avoid feature/complexity creep on what looks like a bare metal installer feature. Might be easier to just document a bit better how to use to-filesystem with an external workflow to prepare the disks and have a reference shell script that can be included in an image build to do this. Something to think about. |
This allows the container image builder more control over `bootc install to-disk` in the installation config. Per discussion in bootc-dev#421 this one definitely requires integration by the base image, and not all of them will want it. (Or if the do want LUKS, they may want more control over it) The default value is `block: ["direct"]` which only enables the simple filesystem install. This change allows two different things: `block: []` With this, `bootc install to-disk` will just error out. It's a way to effectively disable it for those that want to use an external installer always. Another possibility is: `block: ["direct", "tpm2-luks"]` To explicitly re-enable the builtin tpm2-luks flow. Or, one could do just `block: ["tpm2-luks"]` to enforce encrypted installs. Signed-off-by: Colin Walters <[email protected]>
#445 will effectively turn this off by default for now.
I find it extremely useful as it provides a generic baseline, allowing a container image to self-install onto a block device without any other externally versioned infrastructure. (It also tries hard to force configuration to come from the container image by default). Now, I did also file #440 which would make it much easier for containers to configure things in arbitrary ways. |
Looks like between tags |
This allows the container image builder more control over `bootc install to-disk` in the installation config. Per discussion in bootc-dev#421 this one definitely requires integration by the base image, and not all of them will want it. (Or if the do want LUKS, they may want more control over it) The default value is `block: ["direct"]` which only enables the simple filesystem install. This change allows two different things: `block: []` With this, `bootc install to-disk` will just error out. It's a way to effectively disable it for those that want to use an external installer always. Another possibility is: `block: ["direct", "tpm2-luks"]` To explicitly re-enable the builtin tpm2-luks flow. Or, one could do just `block: ["tpm2-luks"]` to enforce encrypted installs. Signed-off-by: Colin Walters <[email protected]>
@cgwalters any ideas on this one? What might be causing systemd-cryptsetup to fail at unlocking the LUKS volume bound to the TPM? I do not notice this issue on non-bootc Fedora ostree systems when binding the root LUKS volume to the TPM with systemd-cryptenroll. My personal opinion is adding an option to "opt into" supporting LUKS volumes is a bandaid/completely wrong response to the issue described here. I do not view this as a functional improvement. TPM2 bound LUKS volumes with systemd-cryptenroll work in other/normal Fedora distros. It should also work here. If there are specific additional steps needed for it to work, those need to be documented. A failover means of unlocking via a recovery key or plain pre-set random password also must be included here by default. |
Update: The culprit appears to be a shim-x64 package update. Downgrading shim-x64 to 15.6-2 resolves this failure to unlock the LUKS root volume. This was on a system (vm) that does not support secure boot which makes no sense to me. The issue was also observed on metal/hardware with a TPM. Ultimately the failure to unlock was caused by disagreeing PCR 7 hashes thus valid (ref to systemd issue). I ran the I am not a system firmware and TPM expert so I do not really know what is normal behavior here. It seems odd that a shim update would cause a TPM PCR to roll particularly on a system that does not support secure boot. If version agreement is required though, bootc needs to check for that between the install environment and deployed image. |
I think shim was recently resigned in Fedora, and PCR 7 contains all the certificates involved. So skew between the host and target is definitely the cause. While arguably Onto the next bit: I would agree with your implication that if one is not using Secure Boot it doesn't make sense to bind to PCR 7 at all... maybe that could be changed in systemd. bootc is just exposing the
Note that again |
Guaranteed to hit it, as I have demonstrated. Normal Fedora gets away with ignoring this detail because by and large systemd-cryptenroll is not supported out of the box. However bootc appears to make a bold claim that it is supported given the available installation options.
I don't think systemd is going to change their defaults. That is the wrong venue to deal with this. I also don't think they intend for their default PCR selection to be a "production ready" "universal" configuration everyone should run. It's a default because they needed one and it is relatively unoffensive, but as I have shown here, not always workable. Only binding a LUKS volume to TPM 7 affords little in the way of actual security since a lot can change before that PCR hash changes. The correct answer is that bootc needs to make the LUKS encryption and TPM binding aspects of the installation process more configurable by downstream users. A lot of users may wish to bind to additional TPM PCRs for added security and others will desire backup passwords. The current implementation, though usable mostly, is simply too naive. If the intent is to keep this feature around it needs to be more configurable.
Message received however I think there are a few assumptions there that are not fair nor reasonable nor substantiated. Yes you probably don't want to build a sophisticated partitioning interface into the to-disk workflow but I am merely highlighting that currently the to-disk workflow lacks basic features and configurability it ought to have. There is scant documentation on how one might employ installing to filesystem with a LUKS root volume. Given the difficulty I encountered with what should be the "easy mode" install, I am really hesitant to sink more time into an even less documented installation path. "DIY your install procedure" is not a resolution to this issue until at a minimum there is documentation showing the permutations of such usage. |
"relatively unoffensive" is definitely a hinge point here.
Yes. I had thought PCR7 wouldn't be a problem but basically it doesn't provide much value, and only causes problems in practice.
Yes, we will work to improve the |
There is systemd-pcrlock coming that appears to address some of the inherent issues when binding to plain PCRs. Ultimately any type of policy driven thing would need to be user configurable as well. I've created issues for the enhancements |
Does
bootc install to-disk --block-setup tpm2-luks /dev/diskX
actually work? I tried this in a qemu virtual machine with emulated TPM (via swtpm) and while it installed successfully, upon rebooting the VM into the freshly installed OS thesystemd-cryptsetup
units failed to decrypt the LUKS volume. Has this actually been tested or otherwise known to work?I will try on real hardware but this has me concerned this feature is not really in a functional state.I tested with vanilla Fedora 39 Server to try and rule out this being related to the use of a virtual machine with emulated TPM. After installing tpm2-tools, adding the tpm2-tss dracut modules, and running systemd-cryptenroll for the LUKS volume I had an installation that repeatedly would unlock automatically via the TPM at boot (no password and no failures). Also tried with Fedora 39 Silverblue (added modules to initrd and enabled custom initramfs generation with rpm-ostree) -- same results. In both cases the LUKS volume was enrolled after the installed OS was provisioned and booted for the first time although I really doubt that has any effect on anything. I do not believe the test setup (IE: emulated TPM) is the problem here though.
Eventually dracut times out and drops into a rescue shell in the initrd. The cryptsetup unit faied with a
Current policy digest does not match stored policy digest, cancelling TPM2 authentication attempt.
error. Further, an error ofNo passphrase or recovery key registered
is also printed. I don't think this is a PCR issue.Some observations:
no PCR so it will always unlock as long as the TPM is presentPCR 7, the systemd-cryptenroll default.The text was updated successfully, but these errors were encountered: