Skip to content

test: skip image polling test if running in a kind cluster #2425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

exdx
Copy link
Member

@exdx exdx commented Oct 14, 2021

Signed-off-by: Daniel Sover [email protected]

Description of the change:
This change skips the image polling test when running with a kind cluster. Previously this test was skipped on upstream CI only (which is also kind), but not locally, which caused the test to fail when running locally on a kind cluster.

The reason for the failure seems to be the context for the kubectl port-forward command to work is not properly configured in the test. It's worth exploring potentially trying to getting the right context at test execution time and seeing if the test works correctly, as it does on minikube and openshift.

Motivation for the change:
Closes #2420

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot requested review from dinhxuanvu and hasbro17 October 14, 2021 20:14
@exdx exdx requested a review from timflannagan October 14, 2021 20:14
Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dinhxuanvu, exdx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2021
@timflannagan
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2021
@exdx
Copy link
Member Author

exdx commented Oct 14, 2021

I assumed the name of the kind control plane node to be static across clusters but it can be different depending on the name of the cluster.
kind-8zcmr8sx8z49wkr5-control-plane
cat-control-plane
are valid names of the kind node depending on the name provided when creating the cluster. There also don't seem to be any obvious APIs or objects that are kind-specific to query for. Seems like listing nodes and running a regex will be a potential solution

Comment on lines 923 to 927
_, err := client.KubernetesInterface().CoreV1().Nodes().Get(context.TODO(), "kind-control-plane", metav1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
return false, nil
}
// error finding node
return false, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exdx Following up your comment in this PR mentioning that we can't rely on a static control plane node name - can we instead just list out the nodes, and check whether any node is prefixed with kind-*? We could also try something like the following:

// fire off a list command
nodes, err := ...
if err != nil {
    ...
}
if len(nodes) == 0 {
    ...
}

var isKindCluster bool
for _, node := range nodes.Items {
    if !strings.HasPrefix(node.GetName(), "kind-") {
        continue
    }
    if !strings.HasSuffix(node.GetName(), "-control-plane") {
        continue
    }
    isKindCluster = true
}

return isKindCluster, nil

Very loosely created that inline, so YMMV, but that should let us avoid needing to use any regex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I realize you can cleanup that implementation further, but the general idea around checking whether the node has kind-*-control-plane present could work?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a good approach and gets the test to skip if running on just about any kind cluster.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@timflannagan
Copy link
Member

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2021
@ecordell
Copy link
Member

Seems like listing nodes and running a regex will be a potential solution

The name of the kind cluster is generated by the e2e test - couldn't that name be threaded into the test?

@exdx
Copy link
Member Author

exdx commented Oct 15, 2021

Seems like listing nodes and running a regex will be a potential solution

The name of the kind cluster is generated by the e2e test - couldn't that name be threaded into the test?

That's an alternate approach -- it would prevent the test from running when using the make directive on the local kind cluster. I like to deploy OLM and run tests via ginkgo directly, in which case the name of the kind node is just the default kind-control-plane. The test wouldn't be skipped in that case I believe since we bypass the e2e provisioner.

Comment on lines +924 to +927
if err != nil {
// error finding nodes
return false, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to just default to false, and avoid returning an error entirely, if we run into an error while listing out the nodes. This would allow call sites to avoid needing to check whether there's an error (and improve readability decreasing the number of conditional chains). Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I had it that way originally, but a failure listing the nodes is an error. I think it should be treated as one, even if it makes the function more complex

Comment on lines +928 to +937
for _, node := range nodes.Items {
if !strings.HasPrefix(node.GetName(), "kind-") {
continue
}
if !strings.HasSuffix(node.GetName(), "-control-plane") {
continue
}
return true, nil
}
return false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Testing this logic out locally and it seemed to catch the default kind node name and any other variations, like `kind-123432-control-plane, and correctly filtered out cluster environment we don't care about in the context of this function.

@timflannagan
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 15, 2021
@@ -918,3 +918,21 @@ func HaveMessage(goal string) gtypes.GomegaMatcher {
return plan.Status.Message
}, ContainSubstring(goal))
}

func inKind(client operatorclient.ClientInterface) (bool, error) {
nodes, err := client.KubernetesInterface().CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to also use context.Background() instead of context.TODO() here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wasn't sure about whether to thread a context through -- might be ok leaving as is for now

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@timflannagan
Copy link
Member

Holding so the bot doesn't go crazy.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2021
@timflannagan
Copy link
Member

Closing as #2445 superseded this PR as there's merge conflicts and this is a real nice QoL improvement for running the e2e suite locally so I wanted to push it through. Thanks for tackling this work 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

e2e-local "image update" in catalog_e2e_test.go fails
5 participants