Closed
Description
Hello folks,
I'm upgrading Firecracker from 0.19.0 to 0.20.0 on firecracker-containerd (firecracker-microvm/firecracker-containerd#383). One of the tests we have is launching micro 100 VMs, and it occasionally got "vCPUs resume failed" error.
Apparently the test was hitting the receive timeout Firecracker internally has (
firecracker/src/vmm/src/lib.rs
Line 941 in 53cf1ba
But I'm not so sure what would be the right way to fix the issue;
- Changing the timeout from 100ms to 1000ms or something longer? It worked for us, but there is no guarantees that 1000ms is enough for everyone.
- No timeout? We could let clients handle timeout. At least it is possible for firecracker-containerd.
- Don't start vcpus as Paused mode? I don't know this is technically possible.
Thanks,
Metadata
Metadata
Assignees
Labels
No labels