Closed
Description
Half of our Mac VMs aren't running.
On our Mac VMware cluster, on the Linux VM that runs x/build/cmd/makemac in a systemd unit:
# journalctl -f -u makemac
...
Feb 15 14:19:06 godns makemac[24540]: $ govc device.usb.add -vm mac_10_11_host10a
Feb 15 14:19:07 godns makemac[24540]: $ govc vm.disk.attach -vm mac_10_11_host10a -link=true -persist=false -ds=Pure1-1 -disk osx_11_frozen/osx_11_frozen.vmdk
Feb 15 14:19:07 godns makemac[24540]: $ govc vm.destroy mac_10_11_host10a
Feb 15 14:19:08 godns makemac[24540]: 2018/02/15 14:19:08 Error creating 10.11: govc vm.disk.attach ...: exit status 1, govc: Invalid configuration for device '0'.
Feb 15 14:19:13 godns makemac[24540]: 2018/02/15 14:19:13 Have capacity for 8 more Mac VMs; creating requested 10.10 ...
Feb 15 14:19:14 godns makemac[24540]: $ govc vm.create -m 4096 -c 6 -on=false -net dvPortGroup-Private -g darwin14_64Guest -ds BOOT_8 mac_10_10_host08a
Feb 15 14:19:16 godns makemac[24540]: $ govc vm.change -e smc.present=TRUE -e ich7m.present=TRUE -e firmware=efi -e guestinfo.key-darwin-amd64-10_10=xx -e guestinfo.name=mac_10_10_host08a -vm mac_10_10_host08a
Feb 15 14:19:17 godns makemac[24540]: $ govc device.usb.add -vm mac_10_10_host08a
Feb 15 14:19:18 godns makemac[24540]: $ govc vm.disk.attach -vm mac_10_10_host08a -link=true -persist=false -ds=Pure1-1 -disk osx_10_frozen/osx_10_frozen.vmdk
Feb 15 14:19:18 godns makemac[24540]: $ govc vm.destroy mac_10_10_host08a
Feb 15 14:19:18 godns makemac[24540]: 2018/02/15 14:19:18 Error creating 10.10: govc vm.disk.attach ...: exit status 1, govc: Invalid configuration for device '0'.
...
Notice all the govc: Invalid configuration for device '0'.
.
Why did this start failing? This has been running unmodified for about 18 months.
Investigate.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
bradfitz commentedon Feb 15, 2018
Original bug report I should've used was #23856.
gopherbot commentedon Feb 15, 2018
Change https://golang.org/cl/94601 mentions this issue:
dashboard: disable Mac trybots for now
dashboard: disable Mac trybots for now
gopherbot commentedon Feb 21, 2018
Change https://golang.org/cl/95735 mentions this issue:
dashboard: adjust how many Mac VMs we expect
dashboard: adjust how many Mac VMs we expect
bradfitz commentedon Feb 21, 2018
Logged in and poked around. It seems our vSphere/vCenter/vWhatever crapped itself and ran out of disk space for something and then went downhill fast into a weird state.
The MacStadium folk are cleaning it up.
bradfitz commentedon Feb 21, 2018
MacStadium said they fixed something, but I still see 5 alerts.
But upon poking around more, I found that 4 of our 10 physical nodes had lost their connections to the shared NFS datastore. I had to manually remount those:
No clue why they became unmounted or why manual action was required to repair it.
But it all seems to be working again, even with VMware still alerting about stuff:
I'm following up with MacStadium about that. (https://portal.macstadium.com/tickets/47331)
/cc @andybons
bradfitz commentedon Feb 21, 2018
And I see all 20 back up & connected.
I'll re-enable trybots.
ianlancetaylor commentedon Nov 29, 2018
Seems like this issue is fixed, so closing.