Skip to content

Allow netbooting from a local UKI iso #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

jimmykarily
Copy link
Contributor

Start the server with something like this:

go build -o build/auroraboot .

docker run --net host --rm --privileged -v $PWD/build/kairos-uki.iso:/kairos.iso -v /var/run/docker.sock:/var/run/docker.sock -v $PWD/build/:/output  -v $PWD/build/auroraboot:/bin/auroraboot --entrypoint /bin/auroraboot  quay.io/kairos/auroraboot --debug --set "local_iso=/kairos.iso" --set "netboot_type=uki" --set "state_dir=/output"

@jimmykarily
Copy link
Contributor Author

I'm getting a kernel panic when immucore runs. I'm trying to debug. I'm putting these notes here to remember how I got some output from immucore.

  • Added this patch to immucore:
+++ b/main.go
@@ -4,6 +4,8 @@ import (
        "context"
        "fmt"
        "os"
+       "runtime"
+       "time"

        "github.com/kairos-io/immucore/internal/utils"
        "github.com/kairos-io/immucore/internal/version"
@@ -25,6 +27,12 @@ func main() {

                utils.MountBasic()
                utils.SetLogger()
+               if os.Getenv("DEBUG") != "" {
+                       runtime.Breakpoint() // This will cause the program to break when run under a debugger
+               } else {
+                       time.Sleep(2 * time.Minute)
+                       utils.DropToEmergencyShell()
+               }

This allows me to run it with DEBUG=true gdb /usr/bin/immucore as soon as I get dropped to the emergency shell (takes a while but eventually I get dropped to a shell)

  • I've also built a custom kairos-init with this patch:
diff --git a/pkg/stages/steps_init.go b/pkg/stages/steps_init.go
index e8ff267..a4dc284 100644
--- a/pkg/stages/steps_init.go
+++ b/pkg/stages/steps_init.go
@@ -97,7 +97,7 @@ func GetInitrdStage(sys values.System, logger types.KairosLogger) ([]schema.Stag
                                                Owner:       0,
                                                Group:       0,
                                                Permissions: 0644,
-                                               Content:     fmt.Sprintf("add_dracutmodules+=\" %s \"\n", networkModule),
+                                               Content:     fmt.Sprintf("add_dracutmodules+=\" %s gdb \"\n", networkModule),
                                        },
                                },
                        },

to get gdb in the initramfs.

  • then I've built an image with this Dockerfile:
FROM quay.io/kairos/fedora:40-core-amd64-generic-v3.4.2-uki

RUN yum install -y gdb

ARG VARIANT=core
ARG MODEL=generic
ARG TRUSTED_BOOT=true
ARG FRAMEWORK_VERSION=v2.21.0
ARG VERSION=v1.2.3

COPY kairos-init /kairos-init
RUN /kairos-init --registry quay.io/kairos -f "${FRAMEWORK_VERSION}" -l debug -s install -m "${MODEL}" -v "${VARIANT}" -t "${TRUSTED_BOOT}" --version "${VERSION}"
RUN /kairos-init --registry quay.io/kairos -f "${FRAMEWORK_VERSION}" -l debug -s init -m "${MODEL}" -v "${VARIANT}" -t "${TRUSTED_BOOT}" --version "${VERSION}"
RUN /kairos-init --registry quay.io/kairos -f "${FRAMEWORK_VERSION}" -l debug --validate -m "${MODEL}" -v "${VARIANT}" -t "${TRUSTED_BOOT}" --version "${VERSION}"
RUN rm /kairos-init

COPY immucore /usr/bin/immucore

(notice how I copy the custom immucore and custom kairos-init)

  • I built a uki iso with:
docker run --privileged --net host -v $PWD/build:/work -v $PWD/e2e/assets/keys:/keys -v /var/run/docker.sock:/var/run/docker.sock -v $PWD/build/auroraboot:/bin/auroraboot --entrypoint /bin/auroraboot quay.io/kairos/auroraboot build-uki --output-dir /work --keys /keys --extend-cmdline "rd.debug rd.break=init rd.shell root=live:LABEL=COS_LIVE root=LABEL=COS_LIVE rd.neednet=1 ip=dhcp" --output-type iso docker://myimage

I serve it with Auroraboot using this command (from this branch):

go build -o build/auroraboot . && docker run --net host --rm --privileged -v $PWD/build/kairos-uki.iso:/kairos-uki.iso -v /var/run/docker.sock:/var/run/docker.sock -v $PWD/build/:/output  -v $PWD/build/auroraboot:/bin/auroraboot --entrypoint /bin/auroraboot  quay.io/kairos/auroraboot --debug --set "local_iso=/kairos-uki.iso" --set "netboot_type=uki" --set "state_dir=/output"
  • I create a vm in virt-manager which I boot from network. Make sure you select a secure boot enabled uefi firmware when creating the VM. Also reset the firmware keys to allow kairos to enroll its own.

  • The VM will boot kairos and will try to start immucore, eventually dropping to an emergency shell (see the patch above). You can then run gdb /usr/bin/immucore -> run and see the output of immucore or add more breakpoints with runtime.Breakpoint().

I will try to simplify all the above by finding a way to get immucore to print to the graphical console instead of just the file. This way I can simply print messages and skip the long process above.

@jimmykarily
Copy link
Contributor Author

jimmykarily commented May 6, 2025

First problem I see:

image

I think the problem is that there is no proper /proc/cmdline for some reason (need to serve it with netboot?). My uki has the rd.immucore.uki parameter in the cmdline but cat /proc/cmdline in the emergency console returns just file. Thus, DetectUKIBoot fails and this check is false: https://github.com/kairos-io/immucore/blob/3c2b55ae1dacac6b951d88be269c82a28f91137e/internal/utils/common.go#L113

@jimmykarily
Copy link
Contributor Author

jimmykarily commented May 6, 2025

This patch in immucore makes the messages appear in the graphical console and drops me to emergency console:

kairos-io/immucore#471

If we can make this configurable, maybe we can even merge a feature like that in immucore.

@jimmykarily
Copy link
Contributor Author

I had a talk with Chatgpt and figured out that some UEFI implementations don't execute the efi file properly and instead of executing the efi file using the stub's enrypoint, they directly execute the kernel (which is part of the UKI file). This way nothing sets the cmdline resulting in the default "file" content.

It suggested something that worked. Instead of serving the Kairos efi file via netboot, I served the grub efi file from Fedora (I had to install these 2 packages in the Auroraboot image: grub2-efi-x64, grub2-efi-x64-modules) and serve this : Efi: types.ID("/boot/efi/EFI/fedora/grubx64.efi") here.

This dropped me to a grub shell. Supposedly, according to chatgpt, grub would try to fetch the grub.cfg from the same http server where the efi file was loaded from but to test that, I would have to change the netboot library we use to serve it. Instead of this, I spawn up a python server that served the Kairos efi file (python3 -m http.server) and in the grub terminal I run:

grub> chainloader (http,192.168.1.36:8000)/netboot.uki.efi
grub> boot

I had a custom build of immucore that dropped me to an emergency shell right after mounting /proc. I was able to check the /proc/cmdline and this was properly set. Actually I took a video of the boot process (because it falls into a kernel panic before I can see what's going on) and immucore goes way further before things break.

This proves that chatgpt was right that the uefi firmware can't properly run the UKI file. Grub loads it correctly and cmdline is set.

Of course this is just a hack. We would need to find a way to automatically chainload our efi file from grub without needing to manually run any commands in the grub terminal. Maybe grub will indeed try to load the grub.cfg automatically from the same http server. I'll investigate that option.

If this works, we can move on to debugging the next failure. Strangely, although I had a custom build of immucore that should drop me to a shell if a panic occured, it didnt' happen and I got a kernel panic instead.

@jimmykarily
Copy link
Contributor Author

In any case, grub is not going to cut it because without systemd-boot, there is nothing to enroll the keys to the firmware.

We tried a different approach, by trying to serve the ipxe efi file (this one: https://boot.ipxe.org/ipxe.efi). The system enters a loop because as soon as ipxe starts it requests for a netboot and our server replies back with the same efi file. But as soon as a Kill our server (auroraboot) it drops me into a shell. From there I tried:

initrd http://192.168.1.36:8000/kairos-uki.iso
chain http://192.168.1.36:8000/memdisk iso raw

with memdisk I found on my arch linux under /usr/lib/syslinux/bios/memdisk. This resulted in Exec format error which according to chatgpt was because:

memdisk doesn't work in UEFI, and there's no UEFI equivalent.

If you're trying to boot something like Ubuntu Live, Rescue CD, or a custom Linux environment, extracting the ISO and booting with kernel+initrd is the right path.

Would you like help building a working boot.ipxe for a specific distro like Ubuntu or Arch?

Next thing to try is netboox.xyz which supposedly boots efi isos.

@jimmykarily
Copy link
Contributor Author

netboox.xyz is booting "modified" versions of upstream distros:

Many Operating System projects provide their software as an ISO only or provide a Live CD/DVD that you can download and boot into memory without modifying the storage of the machine. Typically you then have the option to do an install from the live system. These are typically heavier weight installs and can take a lot of bandwidth to install. iPXE generally does not boot the ISOs directly that well.

In order for us to make it easy to consume those types of images, we monitor new version updates from upstream, retrieve the releases, extract them, and re-release them with modifications to the initrd as needed to make them iPXE friendly. We then can load the smaller size kernel directly into memory for a better and more consistent experience.

not generic ISOs.

Start the server with something like this:

```
go build -o build/auroraboot .

docker run --net host --rm --privileged -v $PWD/build/kairos-uki.iso:/kairos.iso -v /var/run/docker.sock:/var/run/docker.sock -v $PWD/build/:/output  -v $PWD/build/auroraboot:/bin/auroraboot --entrypoint /bin/auroraboot  quay.io/kairos/auroraboot --debug --set "local_iso=/kairos.iso" --set "netboot_type=uki" --set "state_dir=/output"
```

Signed-off-by: Dimitris Karakasilis <[email protected]>
Signed-off-by: Dimitris Karakasilis <[email protected]>
Copy link

codecov bot commented May 8, 2025

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 29.80%. Comparing base (2687875) to head (2f079f6).

Files with missing lines Patch % Lines
internal/cmd/netboot.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #269      +/-   ##
==========================================
- Coverage   29.85%   29.80%   -0.06%     
==========================================
  Files          19       19              
  Lines        2656     2661       +5     
==========================================
  Hits          793      793              
- Misses       1738     1743       +5     
  Partials      125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jimmykarily
Copy link
Contributor Author

I tried various more things but with no success:

  • Enroll the keys using the iso and the trying to directly boot the norole.efi file (signed). If fails with "Access Denied" which implied the keys don't match but I have no idea why. I even manually enrolled the same keys to be sure.
  • Tried to netboot the grub efi stub, loop mount the ISO/efiboot.img/whatever and then chainload to either the systemd-boot or the norole.efi. None of the combinations worked. In the best case, systemd-boot menu loaded but didn't see the conf files thus the menu was empty. The assumption was that as soon as we chainload to the systemd-boot, the loop mount is gone and not visible by systemd-boot

@Itxaka suggested one more idea on Slack: export iscsi lun from auroraboot, ipxe mounts the network iscsi target (I need to clarify what it means) let's give that a final try and if that fails too I think I'm out of ideas.

One last thing to try is to find a known-to-work-microsoft-signed efi file and try to netboot that. If that fails too, then the problem is not our signing of efi files. It's on the firmware not supporting secureboot over netboot.

@jimmykarily
Copy link
Contributor Author

I tried to setup iscsi target both with Auroraboot and with a simple container but it seems to be very hard to set it up containerized. It requires dbus and specific services to be up. I tried various hacks to workaround these issues but it seems it will need specific modules to be enabled on the host anyway. It doesn't seem like a feasible solution to me.

I'll spend the rest of my day to investigate is HTTP boot is a better option so we have an idea on Monday's planning.

@jimmykarily
Copy link
Contributor Author

With a little investigation, is seems that, using HTTP boot, only changes how we load the efi file. The rest of the problems stay:

  • Which efi should we load?
  • How do we perform auto-enrollment of keys?
  • If we manually enroll and boot the Kairos efi file, does the firmware allow us to boot? (with pxe boot it was failing although the keys were the same)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant