Skip to content

[ws-manager] The container could not be located when the pod was terminated #12021

Closed
@utam0k

Description

@utam0k

Bug description

We have observed in production, in preview-env and integration tests an error that results in the status of the workspace being "The container could not be located when the pod was terminated". We check the related GCP log and there is no data loss happened.

Questions

  • Is "The container could not be located when the pod was deleted. The container used to be Running" also happening?

    Yes! Check this GCP log.

  • Is this only happening on stop? Milan's scenario seems to indicate otherwise.

Plan

  1. Verify whether as it is now there is data loss when this error occurs
  2. Check Milan's case to understand if it happened during the running workspace phase.
Old description

We try to get the status of the pod, when is not running anymore, at a time we are not sure. logs

Impact to the user:
(1) the workspace is generally left in a failed state. Users can try to restart, as failed is a terminal phase.
(2) user data may be lost.

This error message(The container could not be located when the pod was terminated) comes from kubelet.
https://github.com/kubernetes/kubernetes/blob/4aa451e8458a7cbf78ed464e9e47e87d424541ce/pkg/kubelet/kubelet_pods.go#L1810-L1817

Potentially related with this Kubernetes bug: kubernetes/kubernetes#104107

Steps to reproduce

I don't know

Workspace affected

No response

Expected behavior

There isn't this error message in production.

Example repository

No response

Anything else?

This has been happening in gen59, gen60 and gen61, too. Logs.

Definition of done

Let's spend some time researching if this is a Kubernetes bug, or in fact could be caused by other circumstances too. Please timebox at 2 hours, after which please share results with the team in Slack, so we can socialize next steps.

Why research? Because the workspaces impacted by this bug end with a Failed status. cc @geropl I'm not sure if a workspace ending in a failed status will negatively impact UBP...assume not, but, wanted to check.

Front logo Front conversations

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions