Skip to content

Conversation

mkeeter
Copy link
Collaborator

@mkeeter mkeeter commented Aug 12, 2025

Follow-up to #2138

Opening as a draft so I can test this out on the relevant systems.

@mkeeter mkeeter requested a review from labbott August 12, 2025 19:58
@mkeeter
Copy link
Collaborator Author

mkeeter commented Aug 13, 2025

I have now tested this out on London.

Executive summary

Gimlet, Sidecar, and PSC all work as expected. Cosmo failed to update the RoT for Mysterious Reasons.

Summary for middle management

  • Before this change, the Gimlet that I connected to didn't contain an SP measurement (checked with verifier-cli in the host using the ipcc interface)
    • Resetting the SP produced a measurement, as expected (because the RoT catches the reset and measures it)
    • Power-cycling the Gimlet did not produce a measurement, also as expected (because the RoT boots slower than the SP)
  • With SP image from this PR flashed onto that Gimlet, reset was noticeably slower (15 + seconds), because the SP resets itself multiple times and the RoT measures it each time. This is fine.
  • With both the SP and RoT updated from this PR, everything works as expected!
    • Resetting the SP is back to the previous speed (~2 seconds), because the new RoT firmware sets the measurement token instructing the SP to boot after it's been measured once
    • Power-cycling the Gimlet does produce an SP measurement (because the SP resets itself for long enough for the RoT to boot and catch it)

I also tested on Sidecar and PSC. In both of these cases, we can't directly check measurements, so I just examined timing. Behavior was the same as the Gimlet, matching my expectations:

  • With just the new SP image, resets were significantly slower (because the SP reset itself multiple times and was measured each time)
  • With both the SP and RoT images updated, resets were their usual speed

This is all well and good, except for Cosmo: the SP on Cosmo dropped off the face of the earth while I was updating the RoT image. Unclear if this is a resurgence of #2157 or something else. Attempts to recover it with Ignition failed due to oxidecomputer/quartz#401 , so we're waiting for someone to physically re-rack the sled (BRM13250012).

Exhaustive testing log

Claim a rack and check versions

inventron hold london 4h
matt@castle ~ $ pilot -rlondon sp ls
MAC               SERIAL      TYPE    IMAGE            IP
a8:40:25:04:02:02 BRM42220036 gimlet  60632b36bb64fc92 fe80::aa40:25ff:fe04:202
a8:40:25:04:02:47 BRM42220030 gimlet  64f0ebe7051dc11f fe80::aa40:25ff:fe04:247
a8:40:25:04:04:02 BRM13250012 cosmo   031a9e9e596d94c6 fe80::aa40:25ff:fe04:402
a8:40:25:04:0c:86 BRM22250001 cosmo   031a9e9e596d94c6 fe80::aa40:25ff:fe04:c86
a8:40:25:05:07:00 BRM44220013 sidecar dcf1e97ab2e9971d fe80::aa40:25ff:fe05:700
a8:40:25:05:27:00 BRM31230004 sidecar 710e93a75e01c3c5 fe80::aa40:25ff:fe05:2700
a8:40:25:06:01:08 BRM11230017 psc     02babcfe6bda5b41 fe80::aa40:25ff:fe06:108

Great, this rack has all of the required hardware for testing. Let's get
hardware versions:

pilot -rlondon sp exec -e "read-component-caboose --component sp NAME" BRM42220036
# repeat for other serials
  • BRM42220030: gimlet-c
  • BRM42220036: gimlet-c-dev
  • BRM13250012: cosmo-a-dev
  • BRM22250001: cosmo-a-dev
  • BRM44220013: sidecar-b
  • BRM31230004: sidecar-c
  • BRM11230017: psc-c

Read the measurement log from a Gimlet

matt@castle ~ $ pilot -rlondon tp login any
The illumos Project     helios-2.0.23476        July 2025
root@oxz_switch1:~# pilot sp ls
CUBBY SERIAL      TYPE    IMAGE            IP
0     -           -       -                -
1     -           -       -                -
2     -           -       -                -
3     -           -       -                -
4     -           -       -                -
5     -           -       -                -
6     -           -       -                -
7     -           -       -                -
8     -           -       -                -
9     -           -       -                -
10    -           -       -                -
11    -           -       -                -
12    -           -       -                -
13    -           -       -                -
14    BRM42220036 gimlet  60632b36bb64fc92 fe80::aa40:25ff:fe04:20a
15    BRM13250012 cosmo   031a9e9e596d94c6 fe80::aa40:25ff:fe04:40a
16    BRM42220030 gimlet  64f0ebe7051dc11f fe80::aa40:25ff:fe04:24f
17    BRM22250001 cosmo   031a9e9e596d94c6 fe80::aa40:25ff:fe04:c8e
18    -           -       -                -
19    -           -       -                -
20    -           -       -                -
21    -           -       -                -
22    -           -       -                -
23    -           -       -                -
24    -           -       -                -
25    -           -       -                -
26    -           -       -                -
27    -           -       -                -
28    -           -       -                -
29    -           -       -                -
30    -           -       -                -
31    -           -       -                -

Let's target BRM42220036

First, I need a way to check for measurements. Let's build verifier-cli on an
illumos machine, e.g. atrium:

git clone [email protected]:oxidecomputer/dice-util
cargo build -pverifier-cli --features=ipcc
cp target/debug/verifier-cli /staff/matt/hubris-2192/

Then, copy it to the Gimlet:

matt@castle ~ $ pilot -rlondon tp copy to -i /staff/matt/hubris-2192/verifier-cli -o /tmp/verifier-cli BRM42220030
matt@castle ~ $ pilot -rlondon tp login BRM42220030
# Escape to the root zone
root@oxz_switch1:~# pilot host copy to -i /tmp/verifier-cli -o /tmp/verifier-cli BRM42220030
root@oxz_switch1:~# pilot host login BRM42220030
BRM42220030 # /tmp/verifier-cli --interface=ipcc log
{"index":0,"measurements":[{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}]}

Note that there are no measurements!

Reset a Gimlet SP and see what happens

We expect that if we reset the SP, it should reboot with a measurement (because
the RoT remains running and will notice the reset).

matt@castle ~ $ pilot -rlondon sp exec -e "reset" BRM42220030
Aug 13 13:34:49.052 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 13:34:49.055 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 13:34:49.056 INFO SP is prepared to reset, component: faux-mgs
Aug 13 13:34:51.059 INFO SP reset complete, component: faux-mgs

Now, we expect measurements to exist because we reset the SP while the RoT
was online.

Re-enable the tp beacon so we can find it again:

matt@castle ~ $ pilot -rlondon sp console BRM42220030
BRM42220030 # svcadm enable svc:/site/compliance/beacon:default

That's not sufficient, we also have to do this:

matt@castle ~ $ PILOT_RACK=london /opt/rackletteadm/scripts/london/enable-tp-announce.sh
environment "london" is held by you: matt (Matt Keeter)
 * locating hosts on interface "london_host0" ...
 * locating hosts on interface "london_host1" ...

...and we apparently have to run it twice for it to work.

Okay, now running verifier-cli shows measurements:

BRM42220030 # /tmp/verifier-cli --interface=ipcc log
{"index":1,"measurements":[{"Sha3_256":[157,109,195,234,243,219,234,20,105,249,84,49,63,66,35,236,191,78,141,9,81,2,208,135,156,254,228,78,2,69,188,28]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}]}

This is Working As Expected

Power-cycle the whole sled and see what happens

We expect the SP to come up without measurements, because it boots faster than
the RoT.

We are targeting is BRM42220030 in cubby 16, which is Ignition Target 4 (RFD
144).

Here's a Sidecar SP (gotten through pilot sp ls -x in the switch zone)

matt@castle ~ $ faux-mgs --interface london_sw1tp0 --discovery-addr='[fe80::aa40:25ff:fe05:2701]:11111' state

It sees the Gimlet at target 4:

matt@castle ~ $ faux-mgs --interface london_sw1tp0 --discovery-addr='[fe80::aa40:25ff:fe05:2701]:11111' ignition 4
Aug 13 13:56:57.179 INFO creating SP handle on interface london_sw1tp0, component: faux-mgs
Aug 13 13:56:57.181 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe05:2701%15]:11111, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
target 4: IgnitionState { receiver: ReceiverStatus { aligned: true, locked: true, polarity_inverted: false }, target: Some(TargetState { system_type: Gimlet, power_state: On, power_reset_in_progress: false, faults: SystemFaults { power_a3: false, power_a2: false, sp: false, rot: false }, controller0_present: true, controller1_present: true, link0_receiver_status: ReceiverStatus { aligned: true, locked: true, polarity_inverted: false }, link1_receiver_status: ReceiverStatus { aligned: true, locked: true, polarity_inverted: false } }) }

Time to say good night:

matt@castle ~ $ faux-mgs --interface london_sw1tp0 --discovery-addr='[fe80::aa40:25ff:fe05:2701]:11111' ignition-command 4 power-reset
Aug 13 13:57:46.523 INFO creating SP handle on interface london_sw1tp0, component: faux-mgs
Aug 13 13:57:46.526 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe05:2701%15]:11111, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 13:57:46.527 INFOsuccessfully sent PowerReset
 ignition command PowerReset send to target 4, component: faux-mgs

...now, we wait for it to reboot

Sure enough, no measurements.

BRM42220030 # /tmp/verifier-cli --interface=ipcc log
{"index":0,"measurements":[{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}]}

Build SP images and copy them to /staff

cargo xtask dist app/gimlet/rev-c-dev.toml
scp target/gimlet-c-dev/dist/default/build-gimlet-c-dev-image-default.zip atrium:/staff/matt/hubris-2192/gimlet-c-dev.zip
cargo xtask dist app/cosmo/rev-a-dev.toml
scp target/cosmo-a-dev/dist/default/build-cosmo-a-dev-image-default.zip atrium:/staff/matt/hubris-2192/cosmo-a-dev.zip
cargo xtask dist app/sidecar/rev-b-dev.toml
scp target/sidecar-b-dev/dist/default/build-sidecar-b-dev-image-default.zip atrium:/staff/matt/hubris-2192/sidecar-b-dev.zip
cargo xtask dist app/psc/rev-c-dev.toml
scp target/psc-c-dev/dist/default/build-psc-c-dev-image-default.zip atrium:/staff/matt/hubris-2192/psc-c-dev.zip

(I'm only plannning to flash the Sidecar B)

Flash new SP image and reset the SP

We expect the reset delay to be longer. Before, it was about 2 seconds.

pilot -rlondon sp exec -e "update sp 0 /staff/matt/hubris-2192/gimlet-c-dev.zip" BRM42220030

Now, do the reset. We expect it to be noticeably longer, because it's waiting
for the RoT to provide a token (and we haven't yet updated the RoT)

matt@castle ~ $ pilot -rlondon sp exec -e "reset" BRM42220030
Aug 13 14:10:12.761 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:10:12.763 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:10:12.763 INFO SP is prepared to reset, component: faux-mgs
Aug 13 14:10:12.768 INFO using watchdog during reset, watchdog_timeout_ms: 90000, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:10:28.774 INFO disabling watchdog, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:10:28.779 INFO SP reset complete, component: faux-mgs
reset complete

It took about 16 seconds. Is this what we expected?
oh yeah, because the RoT measures it each time.

Create signed RoT images and copy then to /staff

cargo xtask dist app/oxide-rot-1/app.toml
export PERMSLIP_URL=https://permslip-staging.corp.oxide.computer
permslip sign "Gimlet RoT Code Signing Staging Development Signer A1" target/oxide-rot-1/dist/a/build-oxide-rot-1-image-a.zip --version 0.0.0-testing --out oxide-rot-1-gimlet-a.zip
permslip sign "Gimlet RoT Code Signing Staging Development Signer A1" target/oxide-rot-1/dist/b/build-oxide-rot-1-image-b.zip --version 0.0.0-testing --out oxide-rot-1-gimlet-b.zip
permslip sign "Sidecar RoT Code Signing Staging Development Signer A1" target/oxide-rot-1/dist/a/build-oxide-rot-1-image-a.zip --version 0.0.0-testing --out oxide-rot-1-sidecar-a.zip
permslip sign "Sidecar RoT Code Signing Staging Development Signer A1" target/oxide-rot-1/dist/b/build-oxide-rot-1-image-b.zip --version 0.0.0-testing --out oxide-rot-1-sidecar-b.zip
permslip sign "PSC RoT Code Signing Staging Development Signer A1" target/oxide-rot-1/dist/a/build-oxide-rot-1-image-a.zip --version 0.0.0-testing --out oxide-rot-1-psc-a.zip
permslip sign "PSC RoT Code Signing Staging Development Signer A1" target/oxide-rot-1/dist/b/build-oxide-rot-1-image-b.zip --version 0.0.0-testing --out oxide-rot-1-psc-b.zip

# Cosmo requires -selfsigned images, which are signed with Bartholomew keys
# and without `dice-mfg`
cargo xtask dist app/oxide-rot-1/app-dev.toml # oxide-rot-1-selfsigned
cp target/oxide-rot-1-selfsigned/dist/a/build-oxide-rot-1-selfsigned-image-a.zip oxide-rot-1-cosmo-a.zip
cp target/oxide-rot-1-selfsigned/dist/b/build-oxide-rot-1-selfsigned-image-b.zip oxide-rot-1-cosmo-b.zip

# Copy everything to Atrium
scp oxide-rot-1-*.zip atrium:/staff/matt/hubris-2192

Flash the new RoT images over the network

We'll do an RoT update on the same SP. First, let's get the system state:

matt@castle ~ $ pilot -rlondon sp exec -e "state" BRM42220030
Aug 13 14:12:49.972 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:12:49.978 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:12:49.984 INFO V2(SpStateV2 { hubris_archive_id: [187, 215, 6, 74, 234, 184, 79, 246], serial_number: [66, 82, 77, 52, 50, 50, 50, 48, 48, 51, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], model: [57, 49, 51, 45, 48, 48, 48, 48, 48, 49, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], revision: 6, base_mac_address: [168, 64, 37, 4, 2, 71], power_state: A0, rot: Ok(RotStateV2 { active: A, persistent_boot_preference: A, pending_persistent_boot_preference: None, transient_boot_preference: None, slot_a_sha3_256_digest: Some([172, 141, 158, 28, 78, 136, 79, 143, 26, 187, 166, 228, 63, 95, 38, 218, 45, 229, 44, 159, 49, 103, 89, 34, 108, 141, 139, 42, 243, 39, 3, 160]), slot_b_sha3_256_digest: Some([89, 192, 21, 184, 144, 5, 17, 121, 60, 4, 255, 70, 40, 226, 3, 91, 208, 92, 22, 14, 103, 200, 34, 165, 246, 7, 8, 140, 211, 181, 234, 151]) }) }), component: faux-mgs
hubris archive: bbd7064aeab84ff6
serial number: BRM42220030
model: 913-0000019
revision: 6
base MAC address: a8:40:25:04:02:47
power state: A0
rot: Ok(RotStateV2 {
active: A,
persistent_boot_preference: A,
pending_persistent_boot_preference: None,
transient_boot_preference: None,
slot_a_sha3_256_digest: Some("ac8d9e1c4e884f8f1abba6e43f5f26da2de52c9f316759226c8d8b2af32703a0"),
slot_b_sha3_256_digest: Some("59c015b8900511793c04ff4628e2035bd05c160e67c822a5f607088cd3b5ea97"),
}

)

This means we have to flash the B image.

pilot -rlondon sp exec -e "update rot 1 /staff/matt/hubris-2192/oxide-rot-1-gimlet-b.zip" BRM42220030

Then select the new slot and reboot into it:

matt@castle ~ $ pilot -rlondon sp exec -e "component-active-slot rot --persist --set 1" BRM42220030
Aug 13 14:15:46.282 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:15:46.285 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
set active slot for SpComponent { id: "rot" } to 1
matt@castle ~ $ pilot -rlondon sp exec -e "state" BRM42220030
Aug 13 14:15:51.001 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:15:51.003 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:15:51.010 INFO V2(SpStateV2 { hubris_archive_id: [187, 215, 6, 74, 234, 184, 79, 246], serial_number: [66, 82, 77, 52, 50, 50, 50, 48, 48, 51, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], model: [57, 49, 51, 45, 48, 48, 48, 48, 48, 49, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], revision: 6, base_mac_address: [168, 64, 37, 4, 2, 71], power_state: A0, rot: Ok(RotStateV2 { active: A, persistent_boot_preference: A, pending_persistent_boot_preference: Some(B), transient_boot_preference: None, slot_a_sha3_256_digest: Some([172, 141, 158, 28, 78, 136, 79, 143, 26, 187, 166, 228, 63, 95, 38, 218, 45, 229, 44, 159, 49, 103, 89, 34, 108, 141, 139, 42, 243, 39, 3, 160]), slot_b_sha3_256_digest: Some([89, 192, 21, 184, 144, 5, 17, 121, 60, 4, 255, 70, 40, 226, 3, 91, 208, 92, 22, 14, 103, 200, 34, 165, 246, 7, 8, 140, 211, 181, 234, 151]) }) }), component: faux-mgs
hubris archive: bbd7064aeab84ff6
serial number: BRM42220030
model: 913-0000019
revision: 6
base MAC address: a8:40:25:04:02:47
power state: A0
rot: Ok(RotStateV2 {
active: A,
persistent_boot_preference: A,
pending_persistent_boot_preference: Some(B),
transient_boot_preference: None,
slot_a_sha3_256_digest: Some("ac8d9e1c4e884f8f1abba6e43f5f26da2de52c9f316759226c8d8b2af32703a0"),
slot_b_sha3_256_digest: Some("59c015b8900511793c04ff4628e2035bd05c160e67c822a5f607088cd3b5ea97"),
}

)
matt@castle ~ $ pilot -rlondon sp exec -e "reset-component rot" BRM42220030
Aug 13 14:16:14.403 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:16:14.406 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:16:14.406 INFO SP is prepared to reset component rot, component: faux-mgs
Aug 13 14:16:14.412 INFO SP reset component rot complete, component: faux-mgs
reset complete

The RoT is now running out of slot B

matt@castle ~ $ pilot -rlondon sp exec -e "state" BRM42220030
Aug 13 14:16:23.962 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:16:23.964 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:16:23.971 INFO V2(SpStateV2 { hubris_archive_id: [187, 215, 6, 74, 234, 184, 79, 246], serial_number: [66, 82, 77, 52, 50, 50, 50, 48, 48, 51, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], model: [57, 49, 51, 45, 48, 48, 48, 48, 48, 49, 57, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], revision: 6, base_mac_address: [168, 64, 37, 4, 2, 71], power_state: A0, rot: Ok(RotStateV2 { active: B, persistent_boot_preference: B, pending_persistent_boot_preference: None, transient_boot_preference: None, slot_a_sha3_256_digest: Some([172, 141, 158, 28, 78, 136, 79, 143, 26, 187, 166, 228, 63, 95, 38, 218, 45, 229, 44, 159, 49, 103, 89, 34, 108, 141, 139, 42, 243, 39, 3, 160]), slot_b_sha3_256_digest: Some([33, 181, 241, 7, 63, 79, 141, 109, 239, 91, 100, 81, 69, 127, 207, 163, 34, 63, 205, 44, 28, 78, 11, 217, 83, 180, 231, 227, 28, 5, 169, 36]) }) }), component: faux-mgs
hubris archive: bbd7064aeab84ff6
serial number: BRM42220030
model: 913-0000019
revision: 6
base MAC address: a8:40:25:04:02:47
power state: A0
rot: Ok(RotStateV2 {
active: B,
persistent_boot_preference: B,
pending_persistent_boot_preference: None,
transient_boot_preference: None,
slot_a_sha3_256_digest: Some("ac8d9e1c4e884f8f1abba6e43f5f26da2de52c9f316759226c8d8b2af32703a0"),
slot_b_sha3_256_digest: Some("21b5f1073f4f8d6def5b6451457fcfa3223fcd2c1c4e0bd953b4e7e31c05a924"),
}

)

If we reset the SP again, we should see a much shorter delay:

matt@castle ~ $ pilot -rlondon sp exec -e "reset" BRM42220030
Aug 13 14:16:53.155 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 14:16:53.157 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:247%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 14:16:53.158 INFO SP is prepared to reset, component: faux-mgs
Aug 13 14:16:55.161 INFO SP reset complete, component: faux-mgs
reset complete

Success; this is the same 2-ish second delay as before!

Power cycle the Gimlet and confirm that it was measured

Same commands as before, but this time we expect the RoT to get a measurement.

BRM42220030 # /tmp/verifier-cli --interface=ipcc log
{"index":1,"measurements":[{"Sha3_256":[35,166,82,22,169,64,56,175,125,183,185,192,127,5,109,12,26,248,78,9,228,83,8,123,172,246,74,55,171,102,239,93]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},{"Sha3_256":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}]}

Success!

Same test on Cosmo

$ pilot sp exec -e 'update sp 0 /staff/matt/hubris-2192/cosmo-a-dev.zip' BRM13250012
$ matt@castle ~ $ pilot sp exec -e reset BRM13250012
Aug 13 15:05:55.271 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 15:05:55.274 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:402%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:05:55.274 INFO SP is prepared to reset, component: faux-mgs
Aug 13 15:05:55.280 INFO using watchdog during reset, watchdog_timeout_ms: 90000, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:06:21.293 INFO disabling watchdog, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:06:21.296 INFO SP reset complete, component: faux-mgs

This took 26 seconds for the reset to complete, which is a little more than
Gimlet, but probably not a concern?

matt@castle ~ $ pilot sp exec -e "update rot 1 /staff/matt//hubris-2192/oxide-rot-1-cosmo-b.zip" BRM13250012
Aug 13 15:08:39.613 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 15:08:39.616 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:402%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:08:39.639 INFO generated update ID, id: 1ba66d0f-d11d-4b17-b102-8cea44dbe63d, component: faux-mgs
Aug 13 15:08:39.664 INFO starting update, total_size: 214984, id: 1ba66d0f-d11d-4b17-b102-8cea44dbe63d, component: rot, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:08:39.670 INFO update in progress, total_size: 214984, bytes_received: 0, component: faux-mgs
Aug 13 15:08:39.671 INFO update preparation complete, update_id: 1ba66d0f-d11d-4b17-b102-8cea44dbe63d, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:08:40.677 INFO update in progress, total_size: 214984, bytes_received: 25428, component: faux-mgs
Aug 13 15:08:41.703 INFO update in progress, total_size: 214984, bytes_received: 50856, component: faux-mgs
Aug 13 15:08:42.709 INFO update in progress, total_size: 214984, bytes_received: 76284, component: faux-mgs
Aug 13 15:08:53.430 ERRO update failed, error: RPC call failed (gave up after 5 attempts), id: 1ba66d0f-d11d-4b17-b102-8cea44dbe63d, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs

Concern: the SP is now off the network

This is cubby 15, previously seen as

15    BRM13250012 cosmo   031a9e9e596d94c6 fe80::aa40:25ff:fe04:40a

We can't connect to the Cosmo, and Ignition doesn't recover it.

Power cycle the Gimlet and confirm that it was measured

Test PSC

matt@castle ~ $ pilot sp exec -e 'update sp 0 /staff/matt/hubris-2192/psc-c-dev.zip' BRM11230017
matt@castle ~ $ pilot sp exec -e reset BRM11230017
Aug 13 15:43:01.866 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 15:43:01.869 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe06:108%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:43:01.869 INFO SP is prepared to reset, component: faux-mgs
Aug 13 15:43:01.874 INFO using watchdog during reset, watchdog_timeout_ms: 90000, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:43:16.882 INFO disabling watchdog, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:43:16.886 INFO SP reset complete, component: faux-mgs
reset complete

15 seconds of reset time, same as Gimlet?

Now, update the RoT

pilot sp exec -e "update rot 1 /staff/matt//hubris-2192/oxide-rot-1-psc.zip" BRM11230017
pilot sp exec -e "component-active-slot rot --persist --set 1" BRM11230017
pilot sp exec -e "reset-component rot" BRM11230017
pilot sp exec -e "state" BRM11230017 # confirm that we're in slot B
matt@castle ~ $ pilot sp exec -e reset BRM11230017
Aug 13 15:46:46.351 INFO creating SP handle on interface london_sw0tp0, component: faux-mgs
Aug 13 15:46:46.352 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe06:108%3]:11111, interface: london_sw0tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:46:46.353 INFO SP is prepared to reset, component: faux-mgs
reset complete
Aug 13 15:46:48.356 INFO SP reset complete, component: faux-mgs

Reset now takes 2 seconds, confirming that the RoT deposited the token.

Test Sidecar

pilot sp exec -e 'update sp 0 /staff/matt/hubris-2192/sidecar-b-dev.zip' BRM44220013
matt@castle ~ $ pilot sp exec -e 'reset' BRM44220013
Aug 13 15:59:54.435 INFO creating SP handle on interface london_sw1tp0, component: faux-mgs
Aug 13 15:59:54.437 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe05:700%15]:11111, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 15:59:54.438 INFO SP is prepared to reset, component: faux-mgs
Aug 13 15:59:54.443 INFO using watchdog during reset, watchdog_timeout_ms: 90000, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 16:00:22.458 INFO disabling watchdog, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 16:00:22.462 INFO SP reset complete, component: faux-mgs

Reset took 28 seconds

Update the RoT

pilot sp exec -e "update rot 1 /staff/matt//hubris-2192/oxide-rot-1-sidecar-b.zip" BRM44220013
pilot sp exec -e "component-active-slot rot --persist --set 1" BRM44220013
pilot sp exec -e "reset-component rot" BRM44220013

Just for fun, let's do the full SP update again:

$ pilot sp exec -e 'update sp 0 /staff/matt/hubris-2192/sidecar-b-dev.zip' BRM44220013
$ pilot sp exec -e 'reset' BRM44220013
Aug 13 16:02:18.185 INFO creating SP handle on interface london_sw1tp0, component: faux-mgs
Aug 13 16:02:18.188 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe05:700%15]:11111, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 16:02:18.189 INFO SP is prepared to reset, component: faux-mgs
Aug 13 16:02:18.194 INFO using watchdog during reset, watchdog_timeout_ms: 90000, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 16:02:32.201 INFO disabling watchdog, interface: london_sw1tp0, socket: control-plane-agent, component: faux-mgs
Aug 13 16:02:32.206 INFO SP reset complete, component: faux-mgs
reset complete

This time it only took 14 seconds, indicating that the RoT token handoff worked

@mkeeter mkeeter force-pushed the mkeeter/measurement-token-everywhere branch from 48f6555 to 1a549ad Compare August 19, 2025 17:36
@mkeeter mkeeter force-pushed the mkeeter/measurement-token-everywhere branch from 1a549ad to 5f436b1 Compare September 3, 2025 17:56
@mkeeter
Copy link
Collaborator Author

mkeeter commented Sep 3, 2025

Testing on Cosmo BRM22250001 after #2211, this seems to work.

After flashing just the new SP firmware, the SP takes ~26 seconds to complete a reset. This is indicative of resetting multiple times (and being measured by the RoT each time).

After flashing the new RoT firmware, SP reset time drops to 8 seconds, indicating that the RoT is interrupting the reset loop with a measurement token.

@mkeeter mkeeter marked this pull request as ready for review September 3, 2025 18:36
@mkeeter mkeeter force-pushed the mkeeter/measurement-token-everywhere branch from 5f436b1 to 79099b5 Compare September 4, 2025 15:02
@mkeeter mkeeter force-pushed the mkeeter/measurement-token-everywhere branch from 79099b5 to 695a51f Compare September 4, 2025 17:07
@mkeeter mkeeter merged commit b50c3c7 into master Sep 4, 2025
135 checks passed
@mkeeter mkeeter deleted the mkeeter/measurement-token-everywhere branch September 4, 2025 17:37
rusty1968 pushed a commit to rusty1968/hubris that referenced this pull request Sep 17, 2025
Follow-up to oxidecomputer#2138

This PR enables the measurement token handoff on remaining Oxide boards:

- Gimlet (except rev B)
- Sidecar
- PSC
- Cosmo

See the PR comments for an exhaustive testing log!
clockdomain pushed a commit to clockdomain/hubris that referenced this pull request Sep 26, 2025
Follow-up to oxidecomputer#2138

This PR enables the measurement token handoff on remaining Oxide boards:

- Gimlet (except rev B)
- Sidecar
- PSC
- Cosmo

See the PR comments for an exhaustive testing log!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants