-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CPUID (1/3): Error type expansion #3104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPUID (1/3): Error type expansion #3104
Conversation
aa315b8
to
98557e0
Compare
c3fb66c
to
cb95b31
Compare
InconsistentVcpuCount, | ||
/// The VgicSysRegsState is invalid | ||
#[error("The VgicSysRegsState is invalid.")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to add a parameter for what the state is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to change the underlying function for this. Where practical I have tried not to change too much functionality.
#[derive(Debug, thiserror::Error)] | ||
pub enum SetMSRsError { | ||
/// Failed to create [`vmm_sys_util::fam::FamStructWrapper`] for MSRs. | ||
#[error("Could not create `vmm_sys_util::fam::FamStructWrapper` for MSRs")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the context for this error may give a hint as to the reason why this may happen, but I don't think this would be a useful error to see in a log. Some context as to why the error happened would be ideal, otherwise dropping this error in favour of the more generic error would seem more appropriate to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with giving further context here is that since fam::Error
doesn't implement std::error::Error
we can't display it nicely (I recently got a PR merged fixing this rust-vmm/vmm-sys-util#168 but we would need to update the version for this).
In this context I'd argue that we leave it in the most descriptive state, considering this will form an error message like:
Failed to boot microVM: Failed to initialize vCPU: Failed to set MSRs: Could not create
vmm_sys_util::fam::FamStructWrapper
for MSRs
When we update our version we can update here to add further context.
It would be useful to remind someone in the future to add this context. Do you know if there is there any recommended approach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context I'd argue that we leave it in the most descriptive state
My concern is that for a Firecracker user, this error isn't descriptive. Parameterising a more specific message from where the error occurs so that the user can troubleshoot would be a better way to go here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is as specific as we can go, all we know from Msrs::from_entries(&msr_entries)
is that an error occurred. fam::Error
contains this description, but for now, we cannot display it (we need there to be a new version of the crate containing fam::Error
so as to expose the std::error::Error
implementation, so we can then display it).
While it would be ideal that every error would be descriptive enough to highlight solutions I think for the moment here it cannot be well done and this is still an improvement over the existing even less specific error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the context of the error here might clarify the usefulness of it. A less specific but general error like "Internal error" for example would at least indicate a system error versus a user error. The current description is for a developer's use, not for the user of Firecracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A less specific but general error like "Internal error" for example would at least indicate a system error versus a user error. The current description is for a developer's use, not for the user of Firecracker.
I think this distinction is difficult to draw and somewhat unnecessary. If we have a descriptive error we can then on the front-end present the user with a simple "Internal error"
. I do not see the argument for having a less descriptive error (in the same sense I would not see an argument for having less descriptive documentation).
Apologies if I am misunderstanding.
@@ -282,20 +282,44 @@ pub fn create_boot_msr_entries() -> Vec<kvm_msr_entry> { | |||
] | |||
} | |||
|
|||
/// Error type for [`set_msrs`]. | |||
#[derive(Debug, thiserror::Error)] | |||
pub enum SetMSRsError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest reframing the error naming that this type encapsulates? My recommendation would be to rename the enum to MSRConfigurationError
? (or MSRAccessError
, then it could contain errors for reading and writing, unless you anticipate a large variety of error types for each category of cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current pattern at the moment is to prefix with the function name. I tend to prefer this as it best illustrates they are bound (and this error is not used elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this mean that there would potentially be a GetMSRsError eventually as well?
The naming here of an error class (enum) associated with each single function seems like overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this mean that there would potentially be a GetMSRsError eventually as well?
Most likely.
The naming here of an error class (enum) associated with each single function seems like overkill?
This is a somewhat opinionated area. In the Rust standard library we see instances of both function specific error types and shared error types. I lean towards specific error types as this provides better documentation and traceability. There is a balance here, if functions return mostly similar errors it may make sense to use a shared error type, otherwise function specific error types may make sense. I find error types act like documentation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO If SetMSR
and GetMSR
have identical/similar errors, they should use the same Error type as you suggest also, so maybe something MSRConfigError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting here, there wouldn't be a GetMSRsError
in Firecracker since there is no get_msrs
in Firecracker (this function is within rust-vmm).
@@ -95,7 +95,7 @@ pub enum StartMicrovmError { | |||
/// Unable to set VmResources. | |||
SetVmResources(VmConfigError), | |||
} | |||
|
|||
impl std::error::Error for StartMicrovmError {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Is the builder.rs
file the appropriate place for this type? I have not looked at this specific crate's builder file before, but it seems to be bloated with a lot of code that may be best put elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -441,6 +442,65 @@ pub fn build_microvm_for_boot( | |||
Ok(vmm) | |||
} | |||
|
|||
/// Error type for [`build_microvm_from_snapshot`]. | |||
#[derive(Debug, thiserror::Error)] | |||
pub enum BuildMicrovmFromSnapshotError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I realize there are other error types in this builder.rs
file already, but I can't help but wonder if there isn't a better spot for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having error types be closely bound to the function/s they correspond to, I find it increases readability.
I think the issue here is rather that builder.rs
is just too big to begin with, potentially some functions could be moved into sub-modules (although I think this may be outside the scope of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, builder is a monster. I think it needs re-working and there are some ideas orthogonal to this PR that might do exactly that.
@@ -65,7 +65,36 @@ pub enum Error { | |||
RestoreGic(arch::aarch64::gic::Error), | |||
} | |||
|
|||
impl Display for Error { | |||
/// Error type for [`Vm::restore_state`] | |||
#[cfg(target_arch = "x86_64")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously not too keen on differentiating coding styles based on the architecture to be built for. Is this not generalizable for x86_64
and aarch64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unforgettably here since the function is entirely different between architectures so must the error types be different (it could be generalized via having the error type having all variants for both architectures, but this would result in unused code and be less clear).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function wasn't different previously, I don't follow why the function should be different given the scope of the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function restore_store
did have an x86_64
and aarch64
variant before:
#[cfg(target_arch = "x86_64")]
/// Restores the Kvm Vm state.
pub fn restore_state(&self, state: &VmState) -> Result<()> {
self.fd
.set_pit2(&state.pitstate)
.map_err(Error::VmSetPit2)?;
self.fd.set_clock(&state.clock).map_err(Error::VmSetClock)?;
self.fd
.set_irqchip(&state.pic_master)
.map_err(Error::VmSetIrqChip)?;
self.fd
.set_irqchip(&state.pic_slave)
.map_err(Error::VmSetIrqChip)?;
self.fd
.set_irqchip(&state.ioapic)
.map_err(Error::VmSetIrqChip)?;
Ok(())
}
#[cfg(target_arch = "aarch64")]
pub fn save_state(&self, mpidrs: &[u64]) -> Result<VmState> {
Ok(VmState {
gic: self
.get_irqchip()
.save_device(mpidrs)
.map_err(Error::SaveGic)?,
})
}
#[cfg(target_arch = "aarch64")]
pub fn restore_state(&self, mpidrs: &[u64], state: &VmState) -> Result<()> {
self.get_irqchip()
.restore_device(mpidrs, &state.gic)
.map_err(Error::RestoreGic)
}
"JAILER_BINARY_SIZE_TARGET": 797632, | ||
"FC_BINARY_SIZE_LIMIT": 2485719, | ||
"FC_BINARY_SIZE_LIMIT": 2646663, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This increase is significant (2646663 − 2485719 = 160944 bytes (~157.7 KB)). This equates to roughly 6% increase in the binary size. I'm assuming that is all due to thiserror
being added as a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thiserror
is already included as a transitive dependency via userfaultfd
and vm-allocator
so this wouldn't contribute additionally to the binary size. The increase is a result of the static strings uses for the std::fmt::Display
implementations for the error types and some of the additional code generated with std::error::Error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a library in a dependency wouldn't automatically result in the library being linked in. I would have expected the net delta in the binary size as a result of the code change in the immediate repository to be better not worse. The change here should be entirely related to the use of thiserror
locally.
Regardless, as discussed offline, the change to the size target/limit here is not exclusively owed to the use of thiserror
due to the 5% tolerance margin. It is just that the change here finally breached the threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test_binary_size.py
locally and came up with a change of (2520632 - 2455280) = 65352
which is less serious.
195a973
to
bf81dcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This touches far more than CPUID. General comment is reducing the scope to maybe just CPUID crate would have been a simpler review, and make future introduction of thiserror more gradual.
@@ -11,6 +11,7 @@ serde = { version = ">=1.0.27", features = ["derive"] } | |||
serde_derive = ">=1.0.27" | |||
serde_json = ">=1.0.9" | |||
derive_more = { version = "0.99.17", default-features = false, features = ["from"] } | |||
thiserror = "1.0.32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to make this >= ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With >=
we cannot guarantee an author won't make a change that breaks our build. = "1.0.32"
is the only way to fully guarantee this, we could loosen it with = "1"
instead if we trust they stick to SemVer.
I'm not strongly opinionated here, I'm not sure if we have a policy on this (we should probably make one if not). What do you think is best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do people tend to break semver? I would assume that if 1
does not become 2
we should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, quite often. Even if they intend to abide SemVer (which they often don't) since their is no common automated checking mechanism (rust-vmm broke it recently). I have no strong opinion here, do you think we should go with thiserror="1"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a strong opinion. Maybe we could play safe and leave it as it is here.
bf81dcd
to
e07def0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. This was big 😛
I really like the changes apart from minor comments.
@@ -11,6 +11,7 @@ serde = { version = ">=1.0.27", features = ["derive"] } | |||
serde_derive = ">=1.0.27" | |||
serde_json = ">=1.0.9" | |||
derive_more = { version = "0.99.17", default-features = false, features = ["from"] } | |||
thiserror = "1.0.32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do people tend to break semver? I would assume that if 1
does not become 2
we should be fine.
@@ -9,11 +9,13 @@ use kvm_bindings::kvm_lapic_state; | |||
use kvm_ioctls::VcpuFd; | |||
use utils::byte_order; | |||
/// Errors thrown while configuring the LAPIC. | |||
#[derive(Debug)] | |||
#[derive(Debug, thiserror::Error, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the PartialEq
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and in other Error types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kvm_ioctls::Error
doesn't implement std::cmp::Eq
unfortunately, so we cannot derive it on types which contain it (deriving PartialEq
is still useful so we can do ErrorX == ErrorY
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need it in the first place? Does thiserror require it?
@@ -282,20 +282,44 @@ pub fn create_boot_msr_entries() -> Vec<kvm_msr_entry> { | |||
] | |||
} | |||
|
|||
/// Error type for [`set_msrs`]. | |||
#[derive(Debug, thiserror::Error)] | |||
pub enum SetMSRsError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO If SetMSR
and GetMSR
have identical/similar errors, they should use the same Error type as you suggest also, so maybe something MSRConfigError
?
/// Error type for [`setup_fpu`]. | ||
#[derive(Debug, derive_more::From, PartialEq)] | ||
pub struct SetupFpuError(utils::errno::Error); | ||
impl std::error::Error for SetupFpuError {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not thiserror
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thiserror
doesn't seem to be able to derive on new types (new types being struct X(Y)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link?
@@ -441,6 +442,65 @@ pub fn build_microvm_for_boot( | |||
Ok(vmm) | |||
} | |||
|
|||
/// Error type for [`build_microvm_from_snapshot`]. | |||
#[derive(Debug, thiserror::Error)] | |||
pub enum BuildMicrovmFromSnapshotError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, builder is a monster. I think it needs re-working and there are some ideas orthogonal to this PR that might do exactly that.
e07def0
to
ae50e26
Compare
3bad40a
ae50e26
to
3bad40a
Compare
Signed-off-by: Jonathan Woollett-Light <[email protected]>
Signed-off-by: Jonathan Woollett-Light <[email protected]>
Signed-off-by: Jonathan Woollett-Light <[email protected]>
3bad40a
to
e4a5778
Compare
Reason for This PR
Error handling around snapshotting and CPUID relies on large error incomplete error types that can be difficult to understand and use appropriately. In addition, these larger types make distinguishing from which function an error arose difficult and make it unclear the possible errors functions can produce.
Description of Changes
Adjusts error handling around snapshotting and CPUID to be more explicit and allow easier expansion.
Dependency justification:
thiserror
thiserror
is a convenience crate exposing a single derive proc-macro to help with defining errors.thiserror
is used here and typically to reduce repetitive code. If this was widely applied throughout the Firecracker code base it could remove thousands of lines of repetitive code, reducing manually written code and thus the likelihood of errors within this code.thiserror
is extensively tested although has no documented testing or security policy,thiserror
is (at the time of writing) the 84th most popular crate on crates.io with ~51 million downloads. The 1st version was released ~3 years ago, and the most recent version was released 22 days ago.We already depend on
thiserror
as a transitive dependency ofuserfaultfd
andvm-allocator
.Test coverage decrease
The decrease in test coverage is a result of the increased code generated for the
std::error::Error
implementations for error types. Some of this is unused as not all error types are printed in testing.Binary size increase
The binary size increase is a result of the additional code generated and the static strings associated with this code. This size difference is as small as ~12kb.
rust-vmm
.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.
PR Checklist
git commit -s
).unsafe
code is properly documented.CHANGELOG.md
.