Skip to content

Conversation

danwinship
Copy link
Contributor

Seen in https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/periodic-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-dualstack-periodic/1299134739436802048/. There is some larger failure mode here which is not yet diagnosed, but on the way to figuring that out I saw that the current error-handling code in the dns-node-resolver pod is not working, resulting in the container crashing with:

2020-08-28T01:57:14.539502724Z /bin/bash: line 38: svc_ips: unbound variable

rather than sleeping and looping

@danwinship
Copy link
Contributor Author

lol wut

2020-08-28T17:49:11.033552409Z /bin/bash: line 51: cmp: command not found
2020-08-28T17:50:11.058408167Z /bin/bash: line 51: cmp: command not found
2020-08-28T17:51:11.080578023Z /bin/bash: line 51: cmp: command not found
2020-08-28T17:52:11.111326141Z /bin/bash: line 51: cmp: command not found
2020-08-28T17:53:11.127450483Z /bin/bash: line 51: cmp: command not found
2020-08-28T17:54:11.142883458Z /bin/bash: line 51: cmp: command not found
2020-08-28T17:55:11.157156676Z /bin/bash: line 51: cmp: command not found
...

was the check broken before and always false? Or maybe this is caused by the RHEL 8 migration?

Under "set -u", if svc_ips is declared-but-empty, then
"${#svc_ips[@]}" results in an "unbound variable" error (which, under
"set -e" causes the script to exit with an error). However,
"${svc_ips[@]}" expands to the empty string, as expected.
@danwinship
Copy link
Contributor Author

So yeah, the RHCOS 8 images appear to not have cmp (or diff; same package, diffutils), while the RHCOS 7 images did. Possible RHCOS bug? (@jupierce? @alvaroaleman?)

You could do something like:

if [[ "$(cat ${TEMP_FILE} | sha256sum)" != "$(cat ${HOSTS_FILE} | sha256sum)" ]]; then
  cp -f "${TEMP_FILE}" "${HOSTS_FILE}"
fi

(The cat+pipe is needed because otherwise sha256sum includes the filename in the output which isn't what we want.)

@alvaroaleman
Copy link

alvaroaleman commented Aug 29, 2020

Sorry, I know nothing about how rhcos 7 and 8 differ. @jupierce and @yselkowitz are the best candidates for questions around that.

@yselkowitz
Copy link

In what container is the script which calls cmp actually run (asking because some operators provide a script but run it in a different container)?

@danwinship
Copy link
Contributor Author

oh, duh, I'm mixing up RHCOS and UBI... I was thinking we couldn't install extra RPMs by hand but obviously we can

@danwinship danwinship force-pushed the unbound-svc-ips branch 2 times, most recently from 4530ed8 to 5c7efff Compare August 31, 2020 12:28
@danwinship
Copy link
Contributor Author

In what container is the script which calls cmp actually run (asking because some operators provide a script but run it in a different container)?

Looks like "openshift/origin-cli:v4.0" or whatever that gets magically turned into.

@yselkowitz
Copy link

@jupierce should diffutils be explicitly added to base for this?

Copy link

@yselkowitz yselkowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the script is run in cli, then installing cmp in this operator isn't going to help.

Dockerfile Outdated
FROM registry.svc.ci.openshift.org/ocp/4.6:base

# dns-node-resolver needs "cmp"
RUN dnf install -y diffutils && dnf clean all

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend yum instead of dnf.

@yselkowitz
Copy link

openshift/images#35 should fix this via inheritance.

@danwinship
Copy link
Contributor Author

If the script is run in cli, then installing cmp in this operator isn't going to help.

yeah, I hadn't realized it was using a different image when I pushed that commit

@danwinship
Copy link
Contributor Author

(I later un-pushed that commit)

@danwinship
Copy link
Contributor Author

/retest

@yselkowitz
Copy link

/lgtm
/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 3, 2020
@yselkowitz
Copy link

/retitle Bug 1872080: Avoid a bash "unbound variable" error

@openshift-ci-robot openshift-ci-robot changed the title Avoid a bash "unbound variable" error Bug 1872080: Avoid a bash "unbound variable" error Sep 3, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Sep 3, 2020
@openshift-ci-robot
Copy link
Contributor

@danwinship: This pull request references Bugzilla bug 1872080, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1872080: Avoid a bash "unbound variable" error

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 3, 2020
@yselkowitz
Copy link

/assign @ironcladlou

@sgreene570
Copy link
Contributor

Looks like theres also the possibility of an unbound variable error here:

if [[ "$?" -eq 0 && "${#ips[@]}" -ne 0 ]]; then

svc_ips is derived from ips.
@danwinship mind addressing that statement in this PR as well?

@sgreene570
Copy link
Contributor

Ah, nevermind, after further reading, the ips variable shouldn't ever be unbound.

for i in ${!cmds[*]}
do
ips=($(eval "${cmds[i]}"))

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, sgreene570, yselkowitz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 3, 2020
@crawford
Copy link

crawford commented Sep 3, 2020

/retest

@yselkowitz
Copy link

/hold
This might not be enough after all, testing now.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 3, 2020
@sgreene570
Copy link
Contributor

/hold
This might not be enough after all, testing now.

Could you elaborate on whats going wrong?

@yselkowitz
Copy link

/hold cancel
Never mind, problem is unrelated.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 3, 2020
@danwinship
Copy link
Contributor Author

The logs are still showing /bin/bash: line 51: cmp: command not found (eg https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/192/pull-ci-openshift-cluster-dns-operator-master-e2e-aws/1300410258996334592/artifacts/e2e-aws/gather-extra/pods/openshift-dns_dns-default-2xq66_dns-node-resolver.log) but that's technically a separate problem from the one being fixed in this PR...

# We will not update /etc/hosts when there is coredns service outage or api unavailability
# Stale entries could exist in /etc/hosts if the service is deleted
if [[ "${#svc_ips[@]}" -ne 0 ]]; then
if [[ -n "${svc_ips[*]}" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? This works for me on RHEL76:

$ echo $BASH_VERSION
4.2.46(2)-release
$ set -u
$ declare -A svc_ips
$ [[ "${#svc_ips[@]}" -ne 0 ]]
$ echo $?
1
$

Do we need to report a BZ against RHEL8's Bash?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much newer bash, probably expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed BZ#1875566, "Bash says a declared but empty array is unbound".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior in 4.2 was a bug. It was fixed incorrectly in 4.3 (making both ${#svc_ips[@]} and ${svc_ips[@]} an error) and then fully fixed in 4.4 (making the former an error but the latter not). This has to do with POSIX compliance or something.

Copy link
Contributor

@sgreene570 sgreene570 Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to have fixed clusters using the RHEL 8 base, but broke clusters using the prior RHEL 7 base:
eg

sh-4.2# echo $BASH_VERSION
4.2.46(2)-release
sh-4.2# set -u
sh-4.2# declare -A svc_ips
sh-4.2# [[-n "${svc_ips[*]}"]]
sh: svc_ips[*]: unbound variable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would [[ -n "${svc_ips[*]-}" ]] work on both Bash 4.2 and 4.4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would [[ -n "${svc_ips[*]-}" ]] work on both Bash 4.2 and 4.4?

I think so... what exactly does the - change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${parameter-default} means to substitute the value of the variable parameter if it is set, or the literal value default (which in our case is the empty string) if parameter is not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had tried changing the original ${#svc_ips[@]} to ${#svc_ips[@]:-0}, which does not work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#193 changes the test to [[ -n "${svc_ips[*]-}" ]].

@yselkowitz
Copy link

missing cmp would be indicative of a stale cache, problem has already been fixed in 4.6:base

@sgreene570
Copy link
Contributor

@openshift-merge-robot openshift-merge-robot merged commit 6b4360e into openshift:master Sep 3, 2020
@openshift-ci-robot
Copy link
Contributor

@danwinship: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state.

Bugzilla bug 1872080 has not been moved to the MODIFIED state.

In response to this:

Bug 1872080: Avoid a bash "unbound variable" error

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants