Skip to content

cartridge metrics #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vasiliy-t opened this issue Jun 18, 2020 · 6 comments
Closed

cartridge metrics #80

vasiliy-t opened this issue Jun 18, 2020 · 6 comments
Labels
feature A new functionality help wanted Extra attention is needed

Comments

@vasiliy-t
Copy link
Contributor

vasiliy-t commented Jun 18, 2020

We could collect such metrics as:

  1. We could add new metric issues_count of type gauge. Value is a number of cluster issues this instance knows. This should be good enough for basic alerting - healthy cluster reports 0 issues. - closed in tnt_cartridge_issues gather only local issues #243 and Add Cartridge issues gauge #144
  2. Cartridge instance state (like OperationError) as numerical value -- needs design
  3. Time since last restart - already present as metric tnt_info_uptime
  4. Failover trigger count -- could be transformed from metric tnt_read_only, but needs desing too
@yngvar-antonsson yngvar-antonsson added the feature A new functionality label Dec 1, 2020
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days

@github-actions github-actions bot added the Stale label Jan 30, 2021
@opomuc
Copy link
Contributor

opomuc commented Feb 10, 2021

This has been started in 4f4917d and released under v0.6.0.

I will close it for now. Feel free to reopen or create a separate issue regarding new metrics in cartridge.

@yngvar-antonsson
Copy link
Contributor

Reopen with 3 new metric.

@opomuc
Copy link
Contributor

opomuc commented Jun 30, 2021

Those are new metrics

  1. Cartridge instance state (like OperationError) as numerical value
  2. Time since last restart
  3. Failover trigger count

@yngvar-antonsson yngvar-antonsson added the help wanted Extra attention is needed label Sep 8, 2021
@rosik
Copy link
Contributor

rosik commented Oct 5, 2021

Design draft. We've discussed making it a part of cartridge. It would be easier to maintain the list.

local state_numeric_const = {
    [''] = 0,
    ['Unconfigured'] = 10,
    ['ConfigFound'] = 11,
    ['ConfigLoaded'] = 12,

    ['BootstrappingBox'] = 20,
    ['RecoveringSnapshot'] = 21,

    ['ConnectingFullmesh'] = 30,

    ['BoxConfigured'] = 40,

    ['ConfiguringRoles'] = 50,
    ['RolesConfigured'] = 51,

    ['ReloadingRoles'] = 60,

    ['InitError'] = 90,
    ['BootError'] = 91,
    ['OperationError'] = 92,
    ['ReloadError'] = 93,
}

for state, _ in pairs(state_transitions) do
    if state_numeric_const[state] == nil then
        error(string.format('Missing numeric const for %q', state))
    end
end

@yngvar-antonsson
Copy link
Contributor

Closed until someone would need those metrics. Feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants