You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[nexus] Snarf ereports from CRDB into support bundles (#8739)
PR #8269 added CRDB tables for storing ereports received from both
service processors and the sled host OS. These ereports are generated to
indicate a fault or other important event, so they contain information
that's probably worth including in service bundles. So we should do
that.
This branch adds code to the `SupportBundleCollector` background task
for querying the database for ereports and putting them in the bundle.
This, in turn, required adding code for querying ereports over a
specified time range. The `BundleRequest` can be constructed with a set
of filters for ereports, including the time window, and a list of serial
numbers to collect ereports from. Presently, we always just use the
default: we collect ereports from all serial numbers from the last 7
days prior to bundle collection. But, I anticipate that this will be
used more in the future when we add a notion of targeted support
bundles: for instance, if we generate a support bundle for a particular
sled, we would probably only grab ereports from that sled.
Ereports are stored in an `ereports` directory in the bundle, with
subdirectories for each serial number that emitted an ereport. Each
serial number directory has a subdirectory for each ereport restart ID
of that serial, and the individual ereports are stored within the
restart ID directory as JSON files. The path to an individual ereport
will be `ereports/${SERIAL_NUMBER}/${RESTART_ID}/${ENA}.json`. I'm open
to changing this organization scheme if others think there's a better
approach --- for example, we could place the restart ID in the filename
rather than in a subdirectory if that would be more useful.
Ereport collection is done in parallel to the rest of the support bundle
collection by spawning Tokio tasks to collect host OS and service
processor ereports. `tokio_util::task::AbortOnDropHandle` is used to
wrap the `JoinHandle`s for these tasks to ensure they're aborted if the
ereport collection future is dropped, so that we stop collecting
ereports if the support bundle is cancelled.
Fixes#8649
0 commit comments