Skip to content

LastHeartbeatTime is not update when plugin does not have a condition #306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
grosser opened this issue Jul 8, 2019 · 24 comments · Fixed by #320
Closed

LastHeartbeatTime is not update when plugin does not have a condition #306

grosser opened this issue Jul 8, 2019 · 24 comments · Fixed by #320

Comments

@grosser
Copy link
Contributor

grosser commented Jul 8, 2019

We have a plugin that runs every 5 minutes. Previously it had a conditions

  "conditions": [
    {
      "type": "NetworkConnectivityProblem",
      "reason": "NetworkConnectivityNormal",
      "message": "Network connectivity is normal"
    }
  ],

which we removed because it causes:

... but now it does not update LastHeartbeatTime at all, even though we see in the logs that it runs (succeeds) every 5 minutes. So we have to chose between 2 broken states ... please add a fix or allow some config like heartBeatMakesSense: true that updates the heartbeat only when the plugin runs 🤷‍♂

FYI rest of the plugin config

{
  "plugin": "custom",
  "pluginConfig": {
    "invoke_interval": "5m",
    "timeout": "60s",
    "max_output_length": 80,
    "concurrency": 1
  },
  "source": "node-network-connectivity",
  "rules": [
    {
      "type": "permanent",
      "condition": "NetworkConnectivityProblem",
      "reason": "NetworkConnectivityFail",
      "path": "./config/plugin/network_connectivity.rb",
      "timeout": "60s"
    }
  ]
}

/cc @eatwithforks @xueweiz @wangzhen127

@grosser
Copy link
Contributor Author

grosser commented Jul 8, 2019

... welp additionally not even failures get reported when conditions are removed, this is a nice trap in general, mis-spell conditions ... to bad, your plugin reports to /dev/null now 🤷‍♂

@wangzhen127
Copy link
Member

If there is no condition object, the heartbeat does not have a place to live in. I think we should just fix #304 and then you have a default condition, where you can have the heartbeat.

LastHeartbeatTime to be updated every minute even if it did not run

I think this is by design. The heartbeat is for the condition object, not for the plugin.

@grosser
Copy link
Contributor Author

grosser commented Jul 9, 2019

Then if there is no conditions object the config should crash/be marked invalid since it will not work ? (same for if the condition does not match the rules ?)

Hmm if the heartbeat is for the condition object, then it should update whenever any plugin for it runs maybe <-> whenever it is "reconsidered" ... not sure if that makes sense :)

@wangzhen127
Copy link
Member

Then if there is no conditions object the config should crash/be marked invalid since it will not work ? (same for if the condition does not match the rules ?)

Why is that? No condition does not mean the plugin is not running. It means the permanent problem does not occur yet.

Hmm if the heartbeat is for the condition object, then it should update whenever any plugin for it runs maybe <-> whenever it is "reconsidered" ... not sure if that makes sense :)

It can be more frequent than that. See how often it gets updated here:
https://github.com/kubernetes/node-problem-detector/blob/master/pkg/exporters/k8sexporter/condition/manager.go#L146
and
https://github.com/kubernetes/node-problem-detector/blob/master/pkg/exporters/k8sexporter/condition/manager.go#L117

@grosser
Copy link
Contributor Author

grosser commented Jul 9, 2019

If there is no condition, the permanent problem will not get set even if it happens.
So we need to either make the condition mandatory or change that behavior.

@wangzhen127
Copy link
Member

If there is no condition, the permanent problem will not get set even if it happens.

I am lost. Why is that? When a permanent problem occur, the corresponding condition object will be created (if not existed before), right?

@grosser
Copy link
Contributor Author

grosser commented Jul 10, 2019

only if there was conditions to begin with ... I tried not having conditions so they would not set/reset the "default" state on deploy ... I can try with empty conditions or bogus conditions value, maybe that works 🤷‍♂

@grosser
Copy link
Contributor Author

grosser commented Jul 10, 2019

confirmed that it never creates a status when conditions is empty or condition was not defined with the exact same name

@wangzhen127
Copy link
Member

That sounds like a bug to me. I haven't got time to test it myself. Do you mind describing the reproduce steps here?

/cc @Random-Liu
/cc @andyxning

@grosser
Copy link
Contributor Author

grosser commented Jul 12, 2019

create a custom plugin without a condition and a failing permanent rule -> never set

@wangzhen127
Copy link
Member

Do you mind sharing your config file and how you generate the failing rule and attach the NPD logs here?

@grosser
Copy link
Contributor Author

grosser commented Jul 12, 2019

see above, just needs a script that does not work like exit 1

@Random-Liu
Copy link
Member

@grosser The current way that NPD works is that the condition you defined in permanent rules have to exist in conditions.

#306 (comment) doesn't have it, thus no condition is set.

If there is no condition object, the heartbeat does not have a place to live in. I think we should just fix #304 and then you have a default condition, where you can have the heartbeat.

Agree.

@grosser
Copy link
Contributor Author

grosser commented Jul 26, 2019

so that should be part of the validation then ? <-> fail fast when config is invalid ?

@wangzhen127
Copy link
Member

so that should be part of the validation then ? <-> fail fast when config is invalid ?

Makes sense. Do you mind sending a fix? :)

@wangzhen127
Copy link
Member

Just added the validation on the config in #320

@grosser
Copy link
Contributor Author

grosser commented Aug 8, 2019

thx 🎉

@wangzhen127
Copy link
Member

The fix is now included in NPD v0.6.6. @grosser, can you double check if your problem is fixed?

@grosser
Copy link
Contributor Author

grosser commented Aug 14, 2019

deployed 0.7.0 with "conditions": [], ... no startup or runtime errors in sight and heartbeat is no longer updated ... so not fixed

@wangzhen127
Copy link
Member

The fix is not included in 0.7.0. The fix is in v0.6.6 and will be in 0.7.1 (not yet released)

@grosser
Copy link
Contributor Author

grosser commented Aug 15, 2019 via email

@wangzhen127
Copy link
Member

Sorry, I misunderstood and miscommunicated. The fix on custom plugin was cherry picked. The validation for the config was not cherry picked (no plan to include in v0.6.x, will be included in 0.7.1)

@wangzhen127
Copy link
Member

Now v0.7.1 is released, which should include the validation for the config. Can you double check again?

@grosser
Copy link
Contributor Author

grosser commented Aug 30, 2019

finally failed now :)

node-problem-detector F0830 17:31:58.373878      31 custom_plugin_monitor.go:78] Failed to validate custom plugin config {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc0003a1c50 TimeoutString:0xc0003a1c60 InvokeInterval:1m0s Timeout:1m0s MaxOutputLength:0xc000481178 Concurrency:0xc000481188 EnableMessageChangeBasedConditionUpdate:0x20095a4} Source: DefaultConditions:[] Rules:[0xc0003cc540] EnableMetricsReporting:0x1f8208c}: Permanent problem NetworkConnectivity does not have preset default condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants