-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Launching existing workspace fails with error message #4803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No one on my team can start workspaces right now, this is on gitpod.io |
samehere, same problem mentioned above. |
This is still showing green: https://www.gitpodstatus.com/ .... |
Could you please try again and let me know if it works now? |
Confirming my workspace is launching without the previous error, will update with final results |
Workspace launched without issue, this is resolved for me, thanks a million! |
Thanks for confirming. One of our image builder failed to build images. Everything should work again. |
Just for transparency, one of our nodes got a full disk and you folks were faster than our alerting system to catch this one 😅 |
Do y'all post postmortems somewhere? Because this was ongoing for quite a bit of time without any communication from Gitpod acknowledging something was broken. That's a pretty slow alert lol. It also seems this disk full error has been around for a long time, is there a permanent fix being worked on? We lost an entire morning of work. If we are going to continue depending on GItpod.io as our primary dev environment, we need to at the very least know you're awake and working on fixing it when it's broken! The first thing on every incident runbook for any team I've ever been a part of is update the status page |
Hi @carlosdp, we are very sorry for the trouble we caused this morning to you and your team. I wanted to take the time and shine some light on our current processes regarding incident response. Timeline
Currently, all our post-mortems are private in our internal Notion. On https://www.gitpodstatus.com/ we post everything classified as a critical and/or major incident. The incident severity classification is based on user impact and which services of gitpod are affected. Usually our full-disk alerts are pretty harmless (i.e. no direct user impact), which is why acc to our runbook we don't call every full disk an incident (and don't update the status page for this). As mentioned in this blog-post, we have metrics that inform us about usage of compute resources (e.g. disk usage), but are improving on metrics that directly show us user-impact (e.g. workspace start failure rate). To give you some more technical context on why you hit quite an edge case for our system: looking at our Grafana dashboards, right now we're running 89 nodes in production, spread over 7 clusters, that are spread into 2 regions (US and EU). I.e. already running workspaces were not affected at all, just like workspaces that are using images that were already built beforehand. In your case you needed a new workspace image in the US region which We definitely want and will get better at this. As mentioned in the blog post, we're iterating on how we do Site Reliability Engineering within the company and being able to measure user impact clearly is on the top-priority list. Regarding a potential fix we will prioritise work on #4804 that automates the clean-up for |
Thanks for the response 👍 I understand what happened now, and I see you are working toward fixing the underlying issue. The only thing I'd stress is just how important that customer communication is during the incident. I've been there before, classifying whether or not an incident was "user-facing" and whether putting out comms about it is necessary. In this case, multiple customers were complaining, so there was clearly customer impact at that point. The moment someone realizes there actually is something wrong, you gotta put a red or yellow thing on that status board. Even if it has no info yet and says "some customers are experiencing issues, we're looking into it." That simple act of being pro-active will save a lot of pain and win you a lot of loyalty, from my experience. I would even go back and retroactively add this incident with a link to this postmortem, because I'm sure there's some subset of customers that didn't see this thread or the forum thread and might think you guys didn't notice there was something wrong earlier. Always better to be extremely quick with letting people know you know there's a problem, even more important than how quickly you remediate imo. 😄 |
Bug description
I tried to launch an existing workspace (https://gitpod.io/start/#indigo-chickadee-cben01ge) I was using last night and receive the error below.
Error: build failed: rpc error: code = Internal desc = cannot create build volume: cannot create build volume:
Steps to reproduce
Start https://gitpod.io/start/#indigo-chickadee-cben01ge and receive error
Expected behavior
Launching normally
Example repository
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: