Skip to content

[WIP] Refactor cadvisor metrics collection #1024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

ehashman
Copy link

@ehashman ehashman commented Oct 28, 2021

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 28, 2021
@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Oct 28, 2021
@openshift-ci-robot
Copy link

@ehashman: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci
Copy link

openshift-ci bot commented Oct 28, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ehashman
To complete the pull request process, please assign marun after the PR has been reviewed.
You can assign the PR to them by writing /assign @marun in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Oct 28, 2021
@openshift-ci openshift-ci bot requested review from marun and sjenning October 28, 2021 20:11
@openshift-ci-robot
Copy link

@ehashman: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci-robot
Copy link

@ehashman: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci-robot
Copy link

@ehashman: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@ehashman
Copy link
Author

maybe a mirror outage? I can't seem to see why rpms aren't building

2021-11-11T21:50:53.234378376Z localdev-rhel-8-server-ose-rpms                 0.0  B/s |   0  B     00:00    
2021-11-11T21:50:53.234720906Z Errors during downloading metadata for repository 'localdev-rhel-8-server-ose-rpms':
2021-11-11T21:50:53.234720906Z   - Curl error (6): Couldn't resolve host name for http://download.lab.bos.redhat.com/rcm-guest/puddles/RHAOS/plashets/4.10-el8/building/x86_64/os/repodata/repomd.xml [Could not resolve host: download.lab.bos.redhat.com]
2021-11-11T21:50:53.234904186Z Error: Failed to download metadata for repo 'localdev-rhel-8-server-ose-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

/test e2e-gcp

@ehashman
Copy link
Author

/retest-required

@ehashman
Copy link
Author

ehashman commented Nov 15, 2021

Hmmmm

ehashman@red-dot:~$ time curl -k http://127.0.0.1:8001/api/v1/nodes/ip-10-0-138-197.us-east-2.compute.internal/proxy/metrics/cadvisor
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="",kernelVersion="4.18.0-305.28.1.el8_4.x86_64",osVersion="Red Hat Enterprise Linux CoreOS 410.84.202111151337-0 (Ootpa)"} 1
# HELP container_scrape_error 1 if there was an error while getting container metrics, 0 otherwise
# TYPE container_scrape_error gauge
container_scrape_error 1
# HELP machine_cpu_cores Number of logical CPU cores.
# TYPE machine_cpu_cores gauge
machine_cpu_cores{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 4
# HELP machine_cpu_physical_cores Number of physical CPU cores.
# TYPE machine_cpu_physical_cores gauge
machine_cpu_physical_cores{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 2
# HELP machine_cpu_sockets Number of CPU sockets.
# TYPE machine_cpu_sockets gauge
machine_cpu_sockets{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 1
# HELP machine_memory_bytes Amount of memory installed on the machine.
# TYPE machine_memory_bytes gauge
machine_memory_bytes{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 1.631668224e+10
# HELP machine_nvm_avg_power_budget_watts NVM power budget.
# TYPE machine_nvm_avg_power_budget_watts gauge
machine_nvm_avg_power_budget_watts{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 0
# HELP machine_nvm_capacity NVM capacity value labeled by NVM mode (memory mode or app direct mode).
# TYPE machine_nvm_capacity gauge
machine_nvm_capacity{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",mode="app_direct_mode",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 0
machine_nvm_capacity{boot_id="e2841504-9e1f-4ee1-b999-e905da23ebb3",machine_id="ec295e11efadc15ff5b7af0ea9cf8453",mode="memory_mode",system_uuid="ec295e11-efad-c15f-f5b7-af0ea9cf8453"} 0
# HELP machine_scrape_error 1 if there was an error while getting machine metrics, 0 otherwise.
# TYPE machine_scrape_error gauge
machine_scrape_error 0

real	0m0.127s
user	0m0.018s
sys	0m0.007s

# HELP container_scrape_error 1 if there was an error while getting container metrics, 0 otherwise
# TYPE container_scrape_error gauge
container_scrape_error 1

need to debug what is causing that, machine metrics appear to register okay

@openshift-ci
Copy link

openshift-ci bot commented Nov 15, 2021

@ehashman: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-csi 25cfe5c link false /test e2e-aws-csi
ci/prow/e2e-agnostic-cmd 25cfe5c link false /test e2e-agnostic-cmd
ci/prow/verify-commits 25cfe5c link true /test verify-commits
ci/prow/verify 25cfe5c link true /test verify
ci/prow/e2e-aws-fips 25cfe5c link true /test e2e-aws-fips
ci/prow/k8s-e2e-conformance-aws 25cfe5c link true /test k8s-e2e-conformance-aws
ci/prow/e2e-aws-serial 25cfe5c link true /test e2e-aws-serial
ci/prow/e2e-gcp 25cfe5c link true /test e2e-gcp
ci/prow/e2e-gcp-upgrade 25cfe5c link true /test e2e-gcp-upgrade
ci/prow/k8s-e2e-gcp 25cfe5c link true /test k8s-e2e-gcp

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2021
@openshift-ci
Copy link

openshift-ci bot commented Nov 21, 2021

@ehashman: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ehashman
Copy link
Author

Moving investigation to kubernetes#106334 since it's a little easier to work upstream... found the source of the error was google/cadvisor#2974 (comment)

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2022
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 25, 2022
@ehashman ehashman closed this Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants