From 73b0a01a75a8b986b191cfd6b44a34bd9957c12a Mon Sep 17 00:00:00 2001 From: Florence Morris Date: Mon, 23 Jun 2025 16:09:19 -0400 Subject: [PATCH 1/4] Added prometheus-endpoing.md with info about metrics endpoint. In monitoring-and-alerting.md, moved info in the existing Prometheus endpoint section to the new page. In self-hosted-deployments.json, added link to new page. --- .../sidebar-data/self-hosted-deployments.json | 6 + src/current/v25.3/monitoring-and-alerting.md | 31 +--- src/current/v25.3/prometheus-endpoint.md | 159 ++++++++++++++++++ 3 files changed, 169 insertions(+), 27 deletions(-) create mode 100644 src/current/v25.3/prometheus-endpoint.md diff --git a/src/current/_includes/v25.3/sidebar-data/self-hosted-deployments.json b/src/current/_includes/v25.3/sidebar-data/self-hosted-deployments.json index d5ce5fe0975..8f997fcaf03 100644 --- a/src/current/_includes/v25.3/sidebar-data/self-hosted-deployments.json +++ b/src/current/_includes/v25.3/sidebar-data/self-hosted-deployments.json @@ -355,6 +355,12 @@ "/${VERSION}/enable-node-map.html" ] }, + { + "title": "Prometheus Endpoint", + "urls": [ + "/${VERSION}/prometheus-endpoint.html" + ] + }, { "title": "Use Prometheus and Alertmanager", "urls": [ diff --git a/src/current/v25.3/monitoring-and-alerting.md b/src/current/v25.3/monitoring-and-alerting.md index 565f8ebf0ed..cdbb0be7ee0 100644 --- a/src/current/v25.3/monitoring-and-alerting.md +++ b/src/current/v25.3/monitoring-and-alerting.md @@ -158,35 +158,12 @@ The [`cockroach node status`]({% link {{ page.version.version }}/cockroach-node. ### Prometheus endpoint -Every node of a CockroachDB cluster exports granular time-series metrics at `http://:/_status/vars`. The metrics are formatted for easy integration with [Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}), an open source tool for storing, aggregating, and querying time-series data. The Prometheus format is human-readable and can be processed to work with other third-party monitoring systems such as [Sysdig](https://sysdig.atlassian.net/wiki/plugins/servlet/mobile?contentId=64946336#content/view/64946336) and [stackdriver](https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/prometheus-to-sd). Many of the [third-party monitoring integrations]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}), such as [Datadog]({% link {{ page.version.version }}/datadog.md %}) and [Kibana]({% link {{ page.version.version }}/kibana.md %}), collect metrics from a cluster's Prometheus endpoint. +Each node in a CockroachDB cluster exports granular time-series metrics at two available endpoints: -To access the Prometheus endpoint of a cluster running on `localhost:8080`: +- [`http://:/_status/vars`]({% link {{ page.version.version }}/prometheus-endpoint.md %}#status-vars) +- {% include_cached new-in.html version="v25.3" %}[`http://:/metrics`]({% link {{ page.version.version }}/prometheus-endpoint.md %}#metrics) -{% include_cached copy-clipboard.html %} -~~~ shell -$ curl http://localhost:8080/_status/vars -~~~ - -~~~ -# HELP gossip_infos_received Number of received gossip Info objects -# TYPE gossip_infos_received counter -gossip_infos_received 0 -# HELP sys_cgocalls Total number of cgo calls -# TYPE sys_cgocalls gauge -sys_cgocalls 3501 -# HELP sys_cpu_sys_percent Current system cpu percentage -# TYPE sys_cpu_sys_percent gauge -sys_cpu_sys_percent 1.098855319644276e-10 -... -~~~ - -{{site.data.alerts.callout_info}} -In addition to using the exported time-series data to monitor a cluster via an external system, you can write alerting rules against them to make sure you are promptly notified of critical events or issues that may require intervention or investigation. See [Events to alert on](#events-to-alert-on) for more details. -{{site.data.alerts.end}} - -If you rely on external tools for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage). - -When storage of time-series metrics is disabled, the DB Console Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. +For more information, refer to the [Prometheus Endpoint page]({% link {{ page.version.version }}/prometheus-endpoint.md %}). ### Critical nodes endpoint diff --git a/src/current/v25.3/prometheus-endpoint.md b/src/current/v25.3/prometheus-endpoint.md new file mode 100644 index 00000000000..ead3531a42b --- /dev/null +++ b/src/current/v25.3/prometheus-endpoint.md @@ -0,0 +1,159 @@ +--- +title: Prometheus Endpoint +summary: Export granular time-series metrics in Prometheus format to monitor a cluster's health and performance. +toc: true +--- + +Each node in a CockroachDB cluster exports granular time-series metrics at two available endpoints: + +- [`http://:/_status/vars`](#_status-vars) +- {% include_cached new-in.html version="v25.3" %}[`http://:/metrics`](#metrics): an enhanced endpoint that includes additional static labels + +The metrics are formatted for integration with [Prometheus](https://prometheus.io/), an open source tool for storing, aggregating, and querying time-series data. For details on how to pull these metrics into Prometheus, refer to [Monitor CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}). The Prometheus format is human-readable and can be processed to work with other Prometheus-compatible third-party monitoring systems such as [Sysdig](https://sysdig.com/integrations/prometheus/) and [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus). Many of the [third-party monitoring integrations]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}), such as [Datadog]({% link {{ page.version.version }}/datadog.md %}) and [Kibana]({% link {{ page.version.version }}/kibana.md %}), collect metrics from a cluster's Prometheus endpoint. + +{{site.data.alerts.callout_info}} +In addition to using the exported time-series data to monitor a cluster through an external system, you can write alerting rules to ensure prompt notification of critical events or issues that may require intervention or investigation. Refer to [Essential Alerts]({% link {{ page.version.version }}/essential-alerts-self-hosted.md %}) for more details. +{{site.data.alerts.end}} + +If you rely on external tools for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage). + +When storage of time-series metrics is disabled, the [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards) in the DB Console are still available, but their visualizations are blank. This occurs because the dashboards rely on data that is no longer available. + +## `_status/vars` + +To access the `_status/vars` Prometheus endpoint of a cluster running on `localhost:8080`: + +{% include_cached copy-clipboard.html %} +~~~ shell +$ curl http://localhost:8080/_status/vars +~~~ + +The output will be similar to the following. Note that the metric names are unique for `sql_*_count*`. + +~~~ +# HELP sys_cgocalls Total number of cgo calls +# TYPE sys_cgocalls counter +sys_cgocalls{node_id="1",tenant="demoapp"} 13737 +# HELP sys_cpu_sys_percent Current system cpu percentage consumed by the CRDB process +# TYPE sys_cpu_sys_percent gauge +sys_cpu_sys_percent{node_id="1",tenant="demoapp"} 0.0021986027879282717 +... +# HELP sql_select_count_internal Number of SQL SELECT statements successfully executed (internal queries) +# TYPE sql_select_count_internal counter +sql_select_count_internal{node_id="1",tenant="demoapp"} 2115 +... +# HELP sql_delete_count Number of SQL DELETE statements successfully executed +# TYPE sql_delete_count counter +sql_delete_count{node_id="1",tenant="demoapp"} 0 +... +# HELP sql_delete_count_internal Number of SQL DELETE statements successfully executed (internal queries) +# TYPE sql_delete_count_internal counter +sql_delete_count_internal{node_id="1",tenant="demoapp"} 996 +... +# HELP sql_select_count Number of SQL SELECT statements successfully executed +# TYPE sql_select_count counter +sql_select_count{node_id="1",tenant="demoapp"} 9 +... +# HELP sql_insert_count_internal Number of SQL INSERT statements successfully executed (internal queries) +# TYPE sql_insert_count_internal counter +sql_insert_count_internal{node_id="1",tenant="demoapp"} 1201 +... +# HELP sql_update_count Number of SQL UPDATE statements successfully executed +# TYPE sql_update_count counter +sql_update_count{node_id="1",tenant="demoapp"} 0 +... +# HELP sql_update_count_internal Number of SQL UPDATE statements successfully executed (internal queries) +# TYPE sql_update_count_internal counter +sql_update_count_internal{node_id="1",tenant="demoapp"} 1907 +... +# HELP sql_insert_count Number of SQL INSERT statements successfully executed +# TYPE sql_insert_count counter +sql_insert_count{node_id="1",tenant="system"} 12 +sql_insert_count{node_id="1",tenant="demoapp"} 15 +... +~~~ + +## `metrics` + +{% include_cached new-in.html version="v25.3" %} + +{{site.data.alerts.callout_info}} +{% include feature-phases/preview.md %} +{{site.data.alerts.end}} + +The `metrics` Prometheus endpoint is commonly used and is the default in Prometheus configurations. + +To access the `metrics` Prometheus endpoint of a cluster running on `localhost:8080`: + +{% include_cached copy-clipboard.html %} +~~~ shell +$ curl http://localhost:8080/metrics +~~~ + +The output will be similar to the following. Note that there is one metric name for `sql_count`, with static labels for `query_type` (with values of `insert`, `select`, `update`, and `delete`) and `query_internal` (with value of `true`). + +~~~ +# HELP sys_cgocalls Total number of cgo calls +# TYPE sys_cgocalls counter +sys_cgocalls{node_id="1",tenant="demoapp"} 13737 +# HELP sys_cpu_sys_percent Current system cpu percentage consumed by the CRDB process +# TYPE sys_cpu_sys_percent gauge +sys_cpu_sys_percent{node_id="1",tenant="demoapp"} 0.0021986027879282717 +... +# HELP sql_count Number of SQL INSERT statements successfully executed (internal queries) +# TYPE sql_count counter +sql_count{node_id="1",tenant="demoapp",query_type="insert",query_internal="true"} 1281 +sql_count{node_id="1",tenant="demoapp",query_type="delete"} 0 +sql_count{node_id="1",tenant="demoapp",query_type="update"} 0 +sql_count{node_id="1",tenant="demoapp",query_type="select",query_internal="true"} 2280 +sql_count{node_id="1",tenant="demoapp",query_type="select"} 9 +sql_count{node_id="1",tenant="demoapp",query_type="insert"} 15 +sql_count{node_id="1",tenant="demoapp",query_type="update",query_internal="true"} 2102 +sql_count{node_id="1",tenant="demoapp",query_type="delete",query_internal="true"} 1067 +... +~~~ + +### Static labels + +Static labels allow segmentation of a metric across various facets for later querying and aggregation. + +Unlabeled metrics from the `_status/vars` endpoint | Labeled metrics from the `metrics` endpoint +-----------------------------------------------|----------------------------------------- +`sql_insert_count` | `sql_count{query_type="insert"}` +`sql_select_count` | `sql_count{query_type="select"}` +`sql_update_count` | `sql_count{query_type="update"}` +`sql_delete_count` | `sql_count{query_type="delete"}` + +At metrics query time, labels provide a smoother user experience: + +Unlabeled sum query from the `_status/vars` endpoint | Labeled sum query from the `metrics` endpoint +-----------------------------------------------|------------------------------- +`sum(sql_insert_count, sql_delete_count, sql_select_count)` | `sum(sql_count)` +This query must be modified if new types are added because they will have new metric names. | This query is resilient to new type additions. +Related metrics can be found via autocomplete in a third-party tool, but it may be unclear. | All label values can be found through a third-party query engine and used to easily construct a graph with individual lines for each label value. + +Another common scenario occurs when each label value represents a disjoint set of categories. An example here is the various certificate expiration metrics, which differ only by the specific certificate they refer to. Operators are unlikely to aggregate these, but may still want to view all certificate expiration metrics on a dashboard. + +For example, the output from the `metrics` endpoint will be similar to the following: + +~~~ +# HELP security_certificate_expiration Expiration for the CA certificate +# TYPE security_certificate_expiration gauge +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ca"} 1.998766953e+09 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ca-client-tenant"} 0 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="node-client"} 0 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client-tenant"} 0 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ui"} 0 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client"} 1.840654953e+09 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client-ca"} 0 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ui-ca"} 0 +security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="node"} 1.840654953e+09 +~~~ + +## See also + +- [Monitor CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) +- [Third-party Monitoring Integrations]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}) +- [Monitor CockroachDB Self-Hosted Clusters with Datadog]({% link {{ page.version.version }}/datadog.md %}) +- [Monitor CockroachDB with Kibana]({% link {{ page.version.version }}/kibana.md %}) +- [Essential Alerts]({% link {{ page.version.version }}/essential-alerts-self-hosted.md %}) \ No newline at end of file From ac7ade2b165ebf86c08f7842e7c987062841a7f8 Mon Sep 17 00:00:00 2001 From: Florence Morris Date: Mon, 23 Jun 2025 16:20:07 -0400 Subject: [PATCH 2/4] fixed link --- src/current/v25.3/monitoring-and-alerting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/current/v25.3/monitoring-and-alerting.md b/src/current/v25.3/monitoring-and-alerting.md index cdbb0be7ee0..139c4210e2e 100644 --- a/src/current/v25.3/monitoring-and-alerting.md +++ b/src/current/v25.3/monitoring-and-alerting.md @@ -160,7 +160,7 @@ The [`cockroach node status`]({% link {{ page.version.version }}/cockroach-node. Each node in a CockroachDB cluster exports granular time-series metrics at two available endpoints: -- [`http://:/_status/vars`]({% link {{ page.version.version }}/prometheus-endpoint.md %}#status-vars) +- [`http://:/_status/vars`]({% link {{ page.version.version }}/prometheus-endpoint.md %}#_status-vars) - {% include_cached new-in.html version="v25.3" %}[`http://:/metrics`]({% link {{ page.version.version }}/prometheus-endpoint.md %}#metrics) For more information, refer to the [Prometheus Endpoint page]({% link {{ page.version.version }}/prometheus-endpoint.md %}). From d7ea811f23f1bf85e4eb06861e3a15576d5de1dd Mon Sep 17 00:00:00 2001 From: Florence Morris Date: Tue, 24 Jun 2025 16:52:02 -0400 Subject: [PATCH 3/4] Replace instances of ({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) with ({% link {{ page.version.version }}/prometheus-endpoint.md %}). Replace instances of (#prometheus-endpoint) with ({% link {{ page.version.version }}/prometheus-endpoint.md %}). --- .../_includes/v25.3/cdc/metrics-labels.md | 2 +- .../faq/clock-synchronization-monitoring.md | 2 +- .../cluster-unavailable-monitoring.md | 2 +- src/current/v25.3/api-support-policy.md | 2 +- .../v25.3/backup-and-restore-monitoring.md | 2 +- src/current/v25.3/datadog.md | 4 ++-- ...y-monitoring-integrations-and-db-console.md | 2 +- src/current/v25.3/kibana.md | 4 ++-- src/current/v25.3/load-based-splitting.md | 2 +- src/current/v25.3/manage-a-backup-schedule.md | 2 +- ...nitor-and-analyze-transaction-contention.md | 2 +- .../v25.3/monitor-and-debug-changefeeds.md | 2 +- .../monitor-cockroachdb-with-prometheus.md | 2 +- src/current/v25.3/monitoring-and-alerting.md | 14 +++++++------- src/current/v25.3/multi-dimensional-metrics.md | 4 ++-- src/current/v25.3/pause-job.md | 2 +- .../v25.3/third-party-monitoring-tools.md | 18 +++++++++--------- 17 files changed, 34 insertions(+), 34 deletions(-) diff --git a/src/current/_includes/v25.3/cdc/metrics-labels.md b/src/current/_includes/v25.3/cdc/metrics-labels.md index 6f97eaffcdd..5e12995ca78 100644 --- a/src/current/_includes/v25.3/cdc/metrics-labels.md +++ b/src/current/_includes/v25.3/cdc/metrics-labels.md @@ -1,4 +1,4 @@ -To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#metrics). Metrics label information is sent with time-series metrics to `http://{host}:{http-port}/_status/vars`, viewable via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). An aggregated metric of all changefeeds is also measured. +To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#metrics). Metrics label information is sent with time-series metrics to `http://{host}:{http-port}/_status/vars`, viewable via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). An aggregated metric of all changefeeds is also measured. It is necessary to consider the following when applying metrics labels to changefeeds: diff --git a/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md b/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md index c3022ad1a32..f08f4144d31 100644 --- a/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md +++ b/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md @@ -1,4 +1,4 @@ -As explained in more detail [in our monitoring documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint), each CockroachDB node exports a wide variety of metrics at `http://:/_status/vars` in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes: +As explained in more detail [in our monitoring documentation]({% link {{ page.version.version }}/prometheus-endpoint.md %}), each CockroachDB node exports a wide variety of metrics at `http://:/_status/vars` in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes: Metric | Definition -------|----------- diff --git a/src/current/_includes/v25.3/prod-deployment/cluster-unavailable-monitoring.md b/src/current/_includes/v25.3/prod-deployment/cluster-unavailable-monitoring.md index 70f7e08e47f..52b1f18dc11 100644 --- a/src/current/_includes/v25.3/prod-deployment/cluster-unavailable-monitoring.md +++ b/src/current/_includes/v25.3/prod-deployment/cluster-unavailable-monitoring.md @@ -1,3 +1,3 @@ {{site.data.alerts.callout_info}} -If the cluster becomes unavailable, the DB Console and Cluster API will also become unavailable. You can continue to monitor the cluster via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) and [logs]({% link {{ page.version.version }}/logging-overview.md %}). +If the cluster becomes unavailable, the DB Console and Cluster API will also become unavailable. You can continue to monitor the cluster via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) and [logs]({% link {{ page.version.version }}/logging-overview.md %}). {{site.data.alerts.end}} \ No newline at end of file diff --git a/src/current/v25.3/api-support-policy.md b/src/current/v25.3/api-support-policy.md index c4f747ed8bc..d2e8b5dcc83 100644 --- a/src/current/v25.3/api-support-policy.md +++ b/src/current/v25.3/api-support-policy.md @@ -98,7 +98,7 @@ A *mixed* API includes both stable and unstable features. [cockroach-commands]: {% link {{ page.version.version }}/cockroach-commands.md %} [cockroach-sql]: {% link {{ page.version.version }}/cockroach-sql.md %} [health-endpoints]: {% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-endpoints -[prometheus-endpoint]: {% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint +[prometheus-endpoint]: {% link {{ page.version.version }}/prometheus-endpoint.md %} [cluster-api]: {% link {{ page.version.version }}/cluster-api.md %} [db-console]: {% link {{ page.version.version }}/ui-overview.md %} [logging-overview]: {% link {{ page.version.version }}/logging-overview.md %} diff --git a/src/current/v25.3/backup-and-restore-monitoring.md b/src/current/v25.3/backup-and-restore-monitoring.md index 53dcea1ac93..27365154e90 100644 --- a/src/current/v25.3/backup-and-restore-monitoring.md +++ b/src/current/v25.3/backup-and-restore-monitoring.md @@ -17,7 +17,7 @@ You can access the [Prometheus Endpoint](#prometheus-endpoint) to track and aler ## Prometheus endpoint -You can access the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) (`http://:/_status/vars`) for backup and restore metrics. +You can access the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) (`http://:/_status/vars`) for backup and restore metrics. Refer to the [Monitor CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) tutorial for guidance on installing and setting up Prometheus and Alertmanager to track metrics. diff --git a/src/current/v25.3/datadog.md b/src/current/v25.3/datadog.md index fb8035ba685..d8a2e105685 100644 --- a/src/current/v25.3/datadog.md +++ b/src/current/v25.3/datadog.md @@ -48,7 +48,7 @@ Uncomment the following line in `cockroachdb.d/conf.yaml`: - prometheus_url: http://localhost:8080/_status/vars ~~~ -This enables metrics collection via our [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). +This enables metrics collection via our [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). ### Configure security certificates @@ -175,7 +175,7 @@ The timeseries graph at the top of the page indicates the configured metric and If you rely on external tools such as Datadog for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage). -When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Datadog based on the data it is collecting from your cluster's Prometheus endpoint. +When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Datadog based on the data it is collecting from your cluster's Prometheus endpoint. ## Known limitations diff --git a/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md b/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md index 0210883d4ae..4fe9c965b45 100644 --- a/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md +++ b/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md @@ -8,7 +8,7 @@ When using [Third-Party Monitoring Integrations]({% link {{ page.version.version ## CockroachDB’s Timeseries Database -CockroachDB stores metrics data in its own internal timeseries database (TSDB). While CockroachDB exposes [point-in-time metrics]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) via its [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint), `/_status/vars`, it also periodically scrapes and writes this information to its own timeseries database storage. This data is scraped every 10 seconds and stored at that resolution. After a period of time determined by the [`timeseries.storage.resolution_10s.ttl` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-10s-ttl), that 10 second resolution data is compacted into a 30 minute resolution. After another period of time determined by the [`timeseries.storage.resolution_30m.ttl cluster setting`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-30m-ttl), the 30 minute resolution data is deleted. +CockroachDB stores metrics data in its own internal timeseries database (TSDB). While CockroachDB exposes [point-in-time metrics]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) via its [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}), `/_status/vars`, it also periodically scrapes and writes this information to its own timeseries database storage. This data is scraped every 10 seconds and stored at that resolution. After a period of time determined by the [`timeseries.storage.resolution_10s.ttl` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-10s-ttl), that 10 second resolution data is compacted into a 30 minute resolution. After another period of time determined by the [`timeseries.storage.resolution_30m.ttl cluster setting`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-30m-ttl), the 30 minute resolution data is deleted. The data in TSDB is used to populate metrics charts in DB Console. TSDB provides its own implementations to compute functions, such as rate-of-change, maximum, sum, etc. It also provides its own implementation to perform downsampling. diff --git a/src/current/v25.3/kibana.md b/src/current/v25.3/kibana.md index 7d3cd6341c9..c9e20a63f58 100644 --- a/src/current/v25.3/kibana.md +++ b/src/current/v25.3/kibana.md @@ -5,7 +5,7 @@ toc: true docs_area: manage --- -[Kibana](https://www.elastic.co/kibana/) is a platform that visualizes data on the [Elastic Stack](https://www.elastic.co/elastic-stack/). This page shows how to use the [CockroachDB module for Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-cockroachdb.html) to collect metrics exposed by your CockroachDB {{ site.data.products.core }} cluster's [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) in Elasticsearch and how to visualize those metrics with Kibana. +[Kibana](https://www.elastic.co/kibana/) is a platform that visualizes data on the [Elastic Stack](https://www.elastic.co/elastic-stack/). This page shows how to use the [CockroachDB module for Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-cockroachdb.html) to collect metrics exposed by your CockroachDB {{ site.data.products.core }} cluster's [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) in Elasticsearch and how to visualize those metrics with Kibana. {{site.data.alerts.callout_success}} To export metrics from a CockroachDB {{ site.data.products.cloud }} cluster, refer to [Export Metrics From a CockroachDB {{ site.data.products.dedicated }} Cluster]({% link cockroachcloud/export-metrics.md %}) instead of this page. @@ -115,7 +115,7 @@ Click **Refresh**. The query metrics will appear on the dashboard: If you rely on external tools such as Kibana for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage). -When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Kibana based on the data it is collecting from your cluster's Prometheus endpoint. +When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Kibana based on the data it is collecting from your cluster's Prometheus endpoint. ## See also diff --git a/src/current/v25.3/load-based-splitting.md b/src/current/v25.3/load-based-splitting.md index cd8e999a409..df20aa952cc 100644 --- a/src/current/v25.3/load-based-splitting.md +++ b/src/current/v25.3/load-based-splitting.md @@ -81,7 +81,7 @@ indicates that CockroachDB wants to [split the range]({% link {{ page.version.ve Usually this log message can be ignored, unless it repeatedly shows up, which can indicate there is a load imbalance problem in the cluster. If there is a load imbalance problem, it could be because a [hot range]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}) cannot be split (because it's really a [hot key]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}#range-report)). -You can see how often a split key cannot be found over time by looking at the following [time-series metric]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint): +You can see how often a split key cannot be found over time by looking at the following [time-series metric]({% link {{ page.version.version }}/prometheus-endpoint.md %}): - `kv.loadsplitter.nosplitkey` diff --git a/src/current/v25.3/manage-a-backup-schedule.md b/src/current/v25.3/manage-a-backup-schedule.md index 9667aad28ba..1f08333e3b9 100644 --- a/src/current/v25.3/manage-a-backup-schedule.md +++ b/src/current/v25.3/manage-a-backup-schedule.md @@ -50,7 +50,7 @@ Further guidance on connecting to Amazon S3, Google Cloud Storage, Azure Storage ## Set up monitoring for the backup schedule -We recommend that you [monitor your backup schedule with Prometheus]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint), and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time—at which point, you can inspect schedules by running [`SHOW SCHEDULES`]({% link {{ page.version.version }}/show-schedules.md %}). +We recommend that you [monitor your backup schedule with Prometheus]({% link {{ page.version.version }}/prometheus-endpoint.md %}), and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time—at which point, you can inspect schedules by running [`SHOW SCHEDULES`]({% link {{ page.version.version }}/show-schedules.md %}). Metrics for scheduled backups fall into two categories: diff --git a/src/current/v25.3/monitor-and-analyze-transaction-contention.md b/src/current/v25.3/monitor-and-analyze-transaction-contention.md index 2e4c649d2b9..7cbdb016fde 100644 --- a/src/current/v25.3/monitor-and-analyze-transaction-contention.md +++ b/src/current/v25.3/monitor-and-analyze-transaction-contention.md @@ -93,7 +93,7 @@ As part of normal operation, CockroachDB continuously records [metrics]({% link - [Export Metrics From a CockroachDB Standard Cluster]({% link cockroachcloud/export-metrics.md %}) - [Export Metrics From a CockroachDB Advanced Cluster]({% link cockroachcloud/export-metrics-advanced.md %}) -- [Prometheus endpoint for a self-hosted cluster]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) +- [Prometheus endpoint for a self-hosted cluster]({% link {{ page.version.version }}/prometheus-endpoint.md %}) The following metrics related to contention are available across all deployment types: diff --git a/src/current/v25.3/monitor-and-debug-changefeeds.md b/src/current/v25.3/monitor-and-debug-changefeeds.md index cbcbc098b5d..fe5cf2ae0eb 100644 --- a/src/current/v25.3/monitor-and-debug-changefeeds.md +++ b/src/current/v25.3/monitor-and-debug-changefeeds.md @@ -24,7 +24,7 @@ The following define the categories of non-retryable errors: - The changefeed cannot convert the data to the specified [output format]({% link {{ page.version.version }}/changefeed-messages.md %}). For example, there are [Avro]({% link {{ page.version.version }}/changefeed-messages.md %}#avro) types that changefeeds do not support, or a [CDC query]({% link {{ page.version.version }}/cdc-queries.md %}) is using an unsupported or malformed expression. - The terminal error happens as part of established changefeed behavior. For example, you have specified the [`schema_change_policy=stop` option]({% link {{ page.version.version }}/create-changefeed.md %}#schema-change-policy) and a schema change happens. -We recommend monitoring changefeeds with [Prometheus]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) to avoid accumulation of garbage after a changefeed encounters an error. See [Garbage collection and changefeeds]({% link {{ page.version.version }}/protect-changefeed-data.md %}) for more detail on how changefeeds interact with [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) and garbage collection. In addition, see the [Recommended changefeed metrics to track](#recommended-changefeed-metrics-to-track) section for the essential metrics to track on a changefeed. +We recommend monitoring changefeeds with [Prometheus]({% link {{ page.version.version }}/prometheus-endpoint.md %}) to avoid accumulation of garbage after a changefeed encounters an error. See [Garbage collection and changefeeds]({% link {{ page.version.version }}/protect-changefeed-data.md %}) for more detail on how changefeeds interact with [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) and garbage collection. In addition, see the [Recommended changefeed metrics to track](#recommended-changefeed-metrics-to-track) section for the essential metrics to track on a changefeed. ## Monitor a changefeed diff --git a/src/current/v25.3/monitor-cockroachdb-with-prometheus.md b/src/current/v25.3/monitor-cockroachdb-with-prometheus.md index 4d391044080..4795535f52c 100644 --- a/src/current/v25.3/monitor-cockroachdb-with-prometheus.md +++ b/src/current/v25.3/monitor-cockroachdb-with-prometheus.md @@ -205,7 +205,7 @@ Although Prometheus lets you graph metrics, [Grafana](https://grafana.com/) is a If you rely on external tools such as Prometheus for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage). -When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Prometheus and AlertManager based on the data Prometheus is collecting from your cluster's Prometheus endpoint. +When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Prometheus and AlertManager based on the data Prometheus is collecting from your cluster's Prometheus endpoint. ## See also diff --git a/src/current/v25.3/monitoring-and-alerting.md b/src/current/v25.3/monitoring-and-alerting.md index 139c4210e2e..1438b6948f9 100644 --- a/src/current/v25.3/monitoring-and-alerting.md +++ b/src/current/v25.3/monitoring-and-alerting.md @@ -21,7 +21,7 @@ This page describes the monitoring and observability tools that are built into C CockroachDB includes several tools to help you monitor your cluster's workloads and performance. {{site.data.alerts.callout_danger}} -If a cluster becomes unavailable, most of the monitoring tools in the following sections become unavailable. In that case, Cockroach Labs recommends that you consult the [cluster logs]({% link {{ page.version.version }}/logging-overview.md %}). To maintain access to a cluster's historical metrics when the cluster is unavailable, configure a [third-party monitoring tool]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}) like Prometheus or Datadog to collect metrics periodically from the [Prometheus endpoint](#prometheus-endpoint). The metrics are stored outside the cluster, and can be used to help troubleshoot what led up to an outage. +If a cluster becomes unavailable, most of the monitoring tools in the following sections become unavailable. In that case, Cockroach Labs recommends that you consult the [cluster logs]({% link {{ page.version.version }}/logging-overview.md %}). To maintain access to a cluster's historical metrics when the cluster is unavailable, configure a [third-party monitoring tool]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}) like Prometheus or Datadog to collect metrics periodically from the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The metrics are stored outside the cluster, and can be used to help troubleshoot what led up to an outage. {{site.data.alerts.end}} ### DB Console @@ -36,7 +36,7 @@ The [Metrics dashboards]({% link {{ page.version.version }}/ui-overview-dashboar To learn more, refer to [Overview Dashboard]({% link {{ page.version.version }}/ui-overview-dashboard.md %}). -Each cluster automatically exposes its metrics at an [endpoint in Prometheus format](#prometheus-endpoint), enabling you to collect them in an external tool like Datadog or your own Prometheus, Grafana, and AlertManager instances. These tools: +Each cluster automatically exposes its metrics at an [endpoint in Prometheus format]({% link {{ page.version.version }}/prometheus-endpoint.md %}), enabling you to collect them in an external tool like Datadog or your own Prometheus, Grafana, and AlertManager instances. These tools: - Collect metrics from the cluster's Prometheus endpoint at an interval you define. - Store historical metrics according to your data retention requirements. @@ -45,17 +45,17 @@ Each cluster automatically exposes its metrics at an [endpoint in Prometheus for Metrics collected by the DB Console are stored within the cluster, and the SQL queries that create the reports on the Metrics dashboards also impose load on the cluster. To avoid this additional load, or if you rely on external tools for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage). -When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint](#prometheus-endpoint). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. +When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. #### SQL Activity pages The SQL Activity pages, which are located within **SQL Activity** in DB Console, provide information about SQL [statements]({% link {{ page.version.version }}/ui-statements-page.md %}), [transactions]({% link {{ page.version.version }}/ui-transactions-page.md %}), and [sessions]({% link {{ page.version.version }}/ui-sessions-page.md %}). -The information on the SQL Activity pages comes from the cluster's [`crdb_internal` system catalog](#crdb_internal-system-catalog). It is not exported via the cluster's [Prometheus endpoint](#prometheus-endpoint). +The information on the SQL Activity pages comes from the cluster's [`crdb_internal` system catalog](#crdb_internal-system-catalog). It is not exported via the cluster's [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). ### Cluster API -The [Cluster API]({% link {{ page.version.version }}/cluster-api.md %}) is a REST API that runs in the cluster and provides much of the same information about your cluster and nodes as is available from the [DB Console](#db-console) or the [Prometheus endpoint](#prometheus-endpoint), and is accessible from each node at the same address and port as the DB Console. +The [Cluster API]({% link {{ page.version.version }}/cluster-api.md %}) is a REST API that runs in the cluster and provides much of the same information about your cluster and nodes as is available from the [DB Console](#db-console) or the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}), and is accessible from each node at the same address and port as the DB Console. If the cluster is unavailable, the Cluster API is also unavailable. @@ -140,7 +140,7 @@ Otherwise, it returns an HTTP `200 OK` status response code with an empty body: {{site.data.alerts.callout_info}} The JSON endpoints are deprecated in favor of the [Cluster API](#cluster-api). -The `/_status/vars` metrics endpoint is in Prometheus format and is not deprecated. For more information, refer to [Prometheus endpoint](#prometheus-endpoint). +The `/_status/vars` metrics endpoint is in Prometheus format and is not deprecated. For more information, refer to [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). {{site.data.alerts.end}} Several endpoints return raw status meta information in JSON at `http://:/#/debug`. You can investigate and use these endpoints, but note that they are subject to change. @@ -1014,7 +1014,7 @@ curl http://localhost:8080/_status/stores/1 In addition to actively monitoring the overall health and performance of a cluster, it is also essential to configure alerting rules that promptly send notifications when CockroachDB experiences events that require investigation or intervention. -Many of the [third-party monitoring integrations]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}), such as [Datadog]({% link {{ page.version.version }}/datadog.md %}) and [Kibana]({% link {{ page.version.version }}/kibana.md %}), also support event-based alerting using metrics collected from a cluster's [Prometheus endpoint](#prometheus-endpoint). Refer to the documentation for an integration for more details. This section identifies the most important events that you might want to create alerting rules for, and provides pre-defined rules definitions for these events appropriate for use with Prometheus's open source [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) service. +Many of the [third-party monitoring integrations]({% link {{ page.version.version }}/third-party-monitoring-tools.md %}), such as [Datadog]({% link {{ page.version.version }}/datadog.md %}) and [Kibana]({% link {{ page.version.version }}/kibana.md %}), also support event-based alerting using metrics collected from a cluster's [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). Refer to the documentation for an integration for more details. This section identifies the most important events that you might want to create alerting rules for, and provides pre-defined rules definitions for these events appropriate for use with Prometheus's open source [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) service. ### Alertmanager diff --git a/src/current/v25.3/multi-dimensional-metrics.md b/src/current/v25.3/multi-dimensional-metrics.md index 325ea252755..7bda723e2c9 100644 --- a/src/current/v25.3/multi-dimensional-metrics.md +++ b/src/current/v25.3/multi-dimensional-metrics.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.metrics --- -Multi-dimensional metrics are additional [Prometheus]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) time series with extra labels. This page will help you understand the potential size of the Prometheus scrape payload for your workload when multi-dimensional metrics are enabled. The number of multi-dimensional metrics can significantly increase based on their associated labels, which increases cardinality. +Multi-dimensional metrics are additional [Prometheus]({% link {{ page.version.version }}/prometheus-endpoint.md %}) time series with extra labels. This page will help you understand the potential size of the Prometheus scrape payload for your workload when multi-dimensional metrics are enabled. The number of multi-dimensional metrics can significantly increase based on their associated labels, which increases cardinality. The export of multi-dimensional metrics can be enabled by two [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}): @@ -475,7 +475,7 @@ For this reason, child `COUNTER` metrics may not always add up to the parent `CO For `GAUGE` metrics, values may be different and potentially unexpected depending on when a setting is enabled. For an example, refer to [7. `GAUGE` metric example](#7-gauge-metric-example). {{site.data.alerts.end}} -These labels affect only the metrics emitted via [Prometheus export]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). They are not visible in the [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards). +These labels affect only the metrics emitted via [Prometheus export]({% link {{ page.version.version }}/prometheus-endpoint.md %}). They are not visible in the [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards). The system retains up to 5,000 recently used label combinations. diff --git a/src/current/v25.3/pause-job.md b/src/current/v25.3/pause-job.md index badccf424f1..915e955bd52 100644 --- a/src/current/v25.3/pause-job.md +++ b/src/current/v25.3/pause-job.md @@ -63,7 +63,7 @@ You can monitor protected timestamps relating to particular CockroachDB jobs wit - `jobs.{job_type}.protected_age_sec` tracks the oldest protected timestamp record protecting `{job_type}` jobs. As this metric increases, garbage accumulation increases. Garbage collection will not progress on a table, database, or cluster if the protected timestamp record is present. - `jobs.{job_type}.protected_record_count` tracks the number of protected timestamp records held by `{job_type}` jobs. -For a full list of the available job types, access your cluster's [`/_status/vars`]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) endpoint. +For a full list of the available job types, access your cluster's [`/_status/vars`]({% link {{ page.version.version }}/prometheus-endpoint.md %}) endpoint. See the following pages for details on metrics: diff --git a/src/current/v25.3/third-party-monitoring-tools.md b/src/current/v25.3/third-party-monitoring-tools.md index 1574fc9d57c..954948fe15d 100644 --- a/src/current/v25.3/third-party-monitoring-tools.md +++ b/src/current/v25.3/third-party-monitoring-tools.md @@ -42,8 +42,8 @@ This list is not exhaustive. Any third-party tool that can consume logs from [fi | CockroachDB Deployment | Integration | Metrics Source | Tutorial | | ---------------------- | ----------- | -------------- | -------- | -| {{ site.data.products.standard }} | [Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Export Metrics From a CockroachDB {{ site.data.products.standard }} Cluster]({% link cockroachcloud/export-metrics.md %}?filters=aws-metrics-export) | -| {{ site.data.products.advanced }} | [Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Export Metrics From a CockroachDB {{ site.data.products.advanced }} Cluster]({% link cockroachcloud/export-metrics-advanced.md %}?filters=aws-metrics-export) | +| {{ site.data.products.standard }} | [Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Export Metrics From a CockroachDB {{ site.data.products.standard }} Cluster]({% link cockroachcloud/export-metrics.md %}?filters=aws-metrics-export) | +| {{ site.data.products.advanced }} | [Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Export Metrics From a CockroachDB {{ site.data.products.advanced }} Cluster]({% link cockroachcloud/export-metrics-advanced.md %}?filters=aws-metrics-export) | #### Logs @@ -57,9 +57,9 @@ This list is not exhaustive. Any third-party tool that can consume logs from [fi | CockroachDB Deployment | Integration | Metrics Source | Tutorial | | ---------------------- | ----------- | -------------- | -------- | -| {{ site.data.products.standard }} | [CockroachDB {{ site.data.products.cloud }} integration for Datadog](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Export Metrics From a CockroachDB {{ site.data.products.standard }} Cluster]({% link cockroachcloud/export-metrics.md %}?filters=datadog-metrics-export) | -| {{ site.data.products.advanced }} | [CockroachDB {{ site.data.products.cloud }} integration for Datadog](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Export Metrics From a CockroachDB {{ site.data.products.advanced }} Cluster]({% link cockroachcloud/export-metrics-advanced.md %}?filters=datadog-metrics-export) | -| {{ site.data.products.core }} | [CockroachDB check for Datadog Agent](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Monitor CockroachDB {{ site.data.products.core }} with Datadog]({% link {{ page.version.version }}/datadog.md %}) | +| {{ site.data.products.standard }} | [CockroachDB {{ site.data.products.cloud }} integration for Datadog](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Export Metrics From a CockroachDB {{ site.data.products.standard }} Cluster]({% link cockroachcloud/export-metrics.md %}?filters=datadog-metrics-export) | +| {{ site.data.products.advanced }} | [CockroachDB {{ site.data.products.cloud }} integration for Datadog](https://docs.datadoghq.com/integrations/cockroachdb_dedicated/) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Export Metrics From a CockroachDB {{ site.data.products.advanced }} Cluster]({% link cockroachcloud/export-metrics-advanced.md %}?filters=datadog-metrics-export) | +| {{ site.data.products.core }} | [CockroachDB check for Datadog Agent](https://docs.datadoghq.com/integrations/cockroachdb/?tab=host) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Monitor CockroachDB {{ site.data.products.core }} with Datadog]({% link {{ page.version.version }}/datadog.md %}) | ### DBmarlin @@ -78,15 +78,15 @@ This list is not exhaustive. Any third-party tool that can consume logs from [fi | CockroachDB Deployment | Integration | Metrics Source | Tutorial | | ---------------------- | ----------- | -------------- | -------- | -| {{ site.data.products.core }} | [CockroachDB module for Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-cockroachdb.html) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Monitor CockroachDB {{ site.data.products.core }} with Kibana]({% link {{ page.version.version }}/kibana.md %}) | +| {{ site.data.products.core }} | [CockroachDB module for Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-cockroachdb.html) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Monitor CockroachDB {{ site.data.products.core }} with Kibana]({% link {{ page.version.version }}/kibana.md %}) | ### Prometheus | CockroachDB Deployment | Integration | Metrics Source | Tutorial | | ---------------------- | ----------- | -------------- | -------- | -| {{ site.data.products.standard }} | [Prometheus](https://prometheus.io/) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Export Metrics From a CockroachDB {{ site.data.products.standard }} Cluster]({% link cockroachcloud/export-metrics.md %}?filters=prometheus-metrics-export) | -| {{ site.data.products.advanced }} | [Prometheus](https://prometheus.io/) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Export Metrics From a CockroachDB {{ site.data.products.advanced }} Cluster]({% link cockroachcloud/export-metrics-advanced.md %}?filters=prometheus-metrics-export) | -| {{ site.data.products.core }} | [Prometheus](https://prometheus.io/) | [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) | [Monitor CockroachDB {{ site.data.products.core }} with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) | +| {{ site.data.products.standard }} | [Prometheus](https://prometheus.io/) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Export Metrics From a CockroachDB {{ site.data.products.standard }} Cluster]({% link cockroachcloud/export-metrics.md %}?filters=prometheus-metrics-export) | +| {{ site.data.products.advanced }} | [Prometheus](https://prometheus.io/) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Export Metrics From a CockroachDB {{ site.data.products.advanced }} Cluster]({% link cockroachcloud/export-metrics-advanced.md %}?filters=prometheus-metrics-export) | +| {{ site.data.products.core }} | [Prometheus](https://prometheus.io/) | [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) | [Monitor CockroachDB {{ site.data.products.core }} with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) | ## See Also From 47ab2a5a4d15f5b0683d3453ecfd582121d739a5 Mon Sep 17 00:00:00 2001 From: Florence Morris Date: Wed, 25 Jun 2025 18:03:06 -0400 Subject: [PATCH 4/4] Replace instances of status/vars with Prometheus endpoint. --- .../_includes/v25.3/cdc/metrics-labels.md | 4 ++-- .../faq/clock-synchronization-monitoring.md | 2 +- .../v25.3/backup-and-restore-monitoring.md | 2 +- ...-monitoring-integrations-and-db-console.md | 4 ++-- .../v25.3/monitor-and-debug-changefeeds.md | 2 +- .../v25.3/monitor-cockroachdb-kubernetes.md | 2 +- .../monitor-cockroachdb-with-prometheus.md | 2 +- src/current/v25.3/monitoring-and-alerting.md | 22 +++++++++---------- src/current/v25.3/pause-job.md | 2 +- src/current/v25.3/row-level-ttl.md | 2 +- .../v25.3/work-with-virtual-clusters.md | 2 +- 11 files changed, 23 insertions(+), 23 deletions(-) diff --git a/src/current/_includes/v25.3/cdc/metrics-labels.md b/src/current/_includes/v25.3/cdc/metrics-labels.md index 5e12995ca78..f77904ad61a 100644 --- a/src/current/_includes/v25.3/cdc/metrics-labels.md +++ b/src/current/_includes/v25.3/cdc/metrics-labels.md @@ -1,8 +1,8 @@ -To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#metrics). Metrics label information is sent with time-series metrics to `http://{host}:{http-port}/_status/vars`, viewable via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). An aggregated metric of all changefeeds is also measured. +To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#metrics). Metrics label information is sent with time-series metrics to the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). An aggregated metric of all changefeeds is also measured. It is necessary to consider the following when applying metrics labels to changefeeds: - The `server.child_metrics.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) must be set to `true` before using the `metrics_label` option. `server.child_metrics.enabled` is enabled by default in {{ site.data.products.standard }} and {{ site.data.products.basic }}. -- Metrics label information is sent to the `_status/vars` endpoint, but will **not** show up in [`debug.zip`]({% link {{ page.version.version }}/cockroach-debug-zip.md %}) or the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}). +- Metrics label information is sent to the Prometheus endpoint, but will **not** show up in [`debug.zip`]({% link {{ page.version.version }}/cockroach-debug-zip.md %}) or the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}). - Introducing labels to isolate a changefeed's metrics can increase cardinality significantly. There is a limit of 1024 unique labels in place to prevent cardinality explosion. That is, when labels are applied to high-cardinality data (data with a higher number of unique values), each changefeed with a label then results in more metrics data to multiply together, which will grow over time. This will have an impact on performance as the metric-series data per changefeed quickly populates against its label. - The maximum length of a metrics label is 128 bytes. diff --git a/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md b/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md index f08f4144d31..a76314c764c 100644 --- a/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md +++ b/src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md @@ -1,4 +1,4 @@ -As explained in more detail [in our monitoring documentation]({% link {{ page.version.version }}/prometheus-endpoint.md %}), each CockroachDB node exports a wide variety of metrics at `http://:/_status/vars` in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes: +As explained in more detail [in our monitoring documentation]({% link {{ page.version.version }}/prometheus-endpoint.md %}), each CockroachDB node exports a wide variety of metrics in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes: Metric | Definition -------|----------- diff --git a/src/current/v25.3/backup-and-restore-monitoring.md b/src/current/v25.3/backup-and-restore-monitoring.md index 27365154e90..00b4ecb6a0e 100644 --- a/src/current/v25.3/backup-and-restore-monitoring.md +++ b/src/current/v25.3/backup-and-restore-monitoring.md @@ -17,7 +17,7 @@ You can access the [Prometheus Endpoint](#prometheus-endpoint) to track and aler ## Prometheus endpoint -You can access the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) (`http://:/_status/vars`) for backup and restore metrics. +You can access the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) for backup and restore metrics. Refer to the [Monitor CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) tutorial for guidance on installing and setting up Prometheus and Alertmanager to track metrics. diff --git a/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md b/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md index 4fe9c965b45..59565b22c7d 100644 --- a/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md +++ b/src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md @@ -8,7 +8,7 @@ When using [Third-Party Monitoring Integrations]({% link {{ page.version.version ## CockroachDB’s Timeseries Database -CockroachDB stores metrics data in its own internal timeseries database (TSDB). While CockroachDB exposes [point-in-time metrics]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) via its [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}), `/_status/vars`, it also periodically scrapes and writes this information to its own timeseries database storage. This data is scraped every 10 seconds and stored at that resolution. After a period of time determined by the [`timeseries.storage.resolution_10s.ttl` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-10s-ttl), that 10 second resolution data is compacted into a 30 minute resolution. After another period of time determined by the [`timeseries.storage.resolution_30m.ttl cluster setting`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-30m-ttl), the 30 minute resolution data is deleted. +CockroachDB stores metrics data in its own internal timeseries database (TSDB). While CockroachDB exposes [point-in-time metrics]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) via its [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}), it also periodically scrapes and writes this information to its own timeseries database storage. This data is scraped every 10 seconds and stored at that resolution. After a period of time determined by the [`timeseries.storage.resolution_10s.ttl` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-10s-ttl), that 10 second resolution data is compacted into a 30 minute resolution. After another period of time determined by the [`timeseries.storage.resolution_30m.ttl cluster setting`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-30m-ttl), the 30 minute resolution data is deleted. The data in TSDB is used to populate metrics charts in DB Console. TSDB provides its own implementations to compute functions, such as rate-of-change, maximum, sum, etc. It also provides its own implementation to perform downsampling. @@ -33,7 +33,7 @@ Datadog scrapes every 60s | 0 | - | - | - | - | - | 0 Since Cockroach Labs does not own the third-party systems, we can not be expected to have intimate knowledge about how each system’s different query language and timeseries database works. -The [metrics export feature]({% link cockroachcloud/export-metrics.md %}) scrapes the `/_status/vars` endpoint every 30 seconds, and forwards the data along to the third-party system. The metrics export does no intermediate aggregation, downsampling, or modification of the timeseries values at any point. The raw metrics export data is at a 30-second resolution, but how that data is processed once received by the third party system is unknown to us. +The [metrics export feature]({% link cockroachcloud/export-metrics.md %}) scrapes the Prometheus endpoint every 30 seconds, and forwards the data along to the third-party system. The metrics export does no intermediate aggregation, downsampling, or modification of the timeseries values at any point. The raw metrics export data is at a 30-second resolution, but how that data is processed once received by the third party system is unknown to us. It is within our scope to understand and support our own timeseries database. If you have problems receiving metrics in your third-party system, [our support]({% link {{ page.version.version }}/support-resources.md %}) can help troubleshoot those problems. However, once the data is ingested into the third-party system, please contact your representative at that third-party company to support issues found on those systems. For example, assuming the raw metric data has been ingested as expected, Cockroach Labs does not support writing queries in third-party systems, such as using Datadog's Metrics Explorer or Datadog Query Language (DQL). diff --git a/src/current/v25.3/monitor-and-debug-changefeeds.md b/src/current/v25.3/monitor-and-debug-changefeeds.md index fe5cf2ae0eb..d53e136fa81 100644 --- a/src/current/v25.3/monitor-and-debug-changefeeds.md +++ b/src/current/v25.3/monitor-and-debug-changefeeds.md @@ -104,7 +104,7 @@ Multiple changefeeds can be added to a label: CREATE CHANGEFEED FOR TABLE movr.vehicle_location_histories INTO 'kafka://host:port' WITH metrics_label=vehicles; ~~~ -`http://{host}:{http-port}/_status/vars` shows the defined changefeed(s) by label and the aggregated metric for all changefeeds. This output also shows the `default` scope, which will include changefeeds started without a metrics label: +The [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) shows the defined changefeed(s) by label and the aggregated metric for all changefeeds. This output also shows the `default` scope, which will include changefeeds started without a metrics label: ~~~ changefeed_running 4 diff --git a/src/current/v25.3/monitor-cockroachdb-kubernetes.md b/src/current/v25.3/monitor-cockroachdb-kubernetes.md index b800eb696dd..b8b24929e4b 100644 --- a/src/current/v25.3/monitor-cockroachdb-kubernetes.md +++ b/src/current/v25.3/monitor-cockroachdb-kubernetes.md @@ -170,7 +170,7 @@ If you're on Hosted GKE, before starting, make sure the email address associated Prometheus graph {{site.data.alerts.callout_success}} - Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, port-forward as described in {% if page.secure == true %}[Access the DB Console]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}#step-4-access-the-db-console){% else %}[Access the DB Console]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}#step-4-access-the-db-console){% endif %} and then point your browser to http://localhost:8080/_status/vars. + Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, port-forward as described in {% if page.secure == true %}[Access the DB Console]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}#step-4-access-the-db-console){% else %}[Access the DB Console]({% link {{ page.version.version }}/deploy-cockroachdb-with-kubernetes.md %}#step-4-access-the-db-console){% endif %} and then point your browser to the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). For more details on using the Prometheus UI, see their [official documentation](https://prometheus.io/docs/introduction/getting_started/). {{site.data.alerts.end}} diff --git a/src/current/v25.3/monitor-cockroachdb-with-prometheus.md b/src/current/v25.3/monitor-cockroachdb-with-prometheus.md index 4795535f52c..ceaa07a3c36 100644 --- a/src/current/v25.3/monitor-cockroachdb-with-prometheus.md +++ b/src/current/v25.3/monitor-cockroachdb-with-prometheus.md @@ -103,7 +103,7 @@ This tutorial explores the CockroachDB {{ site.data.products.core }} integration ~~~ 1. Point your browser to `http://:9090`, where you can use the Prometheus UI to query, aggregate, and graph CockroachDB time series metrics. - - Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, point your browser to `http://:8080/_status/vars`. + - Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, point your browser to the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). - For more details on using the Prometheus UI, see their [official documentation](https://prometheus.io/docs/introduction/getting_started/). ## Step 4. Send notifications with Alertmanager diff --git a/src/current/v25.3/monitoring-and-alerting.md b/src/current/v25.3/monitoring-and-alerting.md index 1438b6948f9..d0865155ab8 100644 --- a/src/current/v25.3/monitoring-and-alerting.md +++ b/src/current/v25.3/monitoring-and-alerting.md @@ -1107,7 +1107,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when a node has been down for 15 minutes or more. -- **How to detect:** If a node is down, its `_status/vars` endpoint will return a `Connection refused` error. Otherwise, the `liveness_livenodes` metric will be the total number of live nodes in the cluster. +- **How to detect:** If a node is down, its Prometheus endpoint will return a `Connection refused` error. Otherwise, the `liveness_livenodes` metric will be the total number of live nodes in the cluster. - **Rule definition:** Use the `InstanceDead` alert from our pre-defined alerting rules. @@ -1115,7 +1115,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert if a node has restarted more than once in the last 10 minutes. -- **How to detect:** Calculate this using the number of times the `sys_uptime` metric in the node's `_status/vars` output was reset back to zero. The `sys_uptime` metric gives you the length of time, in seconds, that the `cockroach` process has been running. +- **How to detect:** Calculate this using the number of times the `sys_uptime` metric in the node's Prometheus endpoint output was reset back to zero. The `sys_uptime` metric gives you the length of time, in seconds, that the `cockroach` process has been running. - **Rule definition:** Use the `InstanceFlapping` alert from our pre-defined alerting rules. @@ -1123,7 +1123,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when a node has less than 15% of free space remaining. -- **How to detect:** Divide the `capacity` metric by the `capacity_available` metric in the node's `_status/vars` output. +- **How to detect:** Divide the `capacity` metric by the `capacity_available` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `StoreDiskLow` alert from our pre-defined alerting rules. @@ -1133,13 +1133,13 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when a node is not executing SQL despite having connections. -- **How to detect:** The `sql_conns` metric in the node's `_status/vars` output will be greater than `0` while the `sql_query_count` metric will be `0`. You can also break this down by statement type using `sql_select_count`, `sql_insert_count`, `sql_update_count`, and `sql_delete_count`. +- **How to detect:** The `sql_conns` metric in the node's Prometheus endpoint output will be greater than `0` while the `sql_query_count` metric will be `0`. You can also break this down by statement type using `sql_select_count`, `sql_insert_count`, `sql_update_count`, and `sql_delete_count`. #### CA certificate expires soon - **Rule:** Send an alert when the CA certificate on a node will expire in less than a year. -- **How to detect:** Calculate this using the `security_certificate_expiration_ca` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `security_certificate_expiration_ca` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `CACertificateExpiresSoon` alert from our pre-defined alerting rules. @@ -1147,7 +1147,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when a node's certificate will expire in less than a year. -- **How to detect:** Calculate this using the `security_certificate_expiration_node` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `security_certificate_expiration_node` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `NodeCertificateExpiresSoon` alert from our pre-defined alerting rules. @@ -1161,7 +1161,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when the number of ranges with fewer live replicas than needed for quorum is non-zero for too long. -- **How to detect:** Calculate this using the `ranges_unavailable` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `ranges_unavailable` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `UnavailableRanges` alerting rule from your cluster's [`api/v2/rules/` metrics endpoint](#alertmanager). @@ -1169,7 +1169,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when a replica stops serving traffic due to other replicas being offline for too long. -- **How to detect:** Calculate this using the `kv_replica_circuit_breaker_num_tripped_replicas` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `kv_replica_circuit_breaker_num_tripped_replicas` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `TrippedReplicaCircuitBreakers` alerting rule from your cluster's [`api/v2/rules/` metrics endpoint](#alertmanager). @@ -1177,7 +1177,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when the number of ranges with replication below the [replication factor]({% link {{ page.version.version }}/configure-replication-zones.md %}#num_replicas) is non-zero for too long. -- **How to detect:** Calculate this using the `ranges_underreplicated` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `ranges_underreplicated` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `UnderreplicatedRanges` alerting rule from your cluster's [`api/v2/rules/` metrics endpoint](#alertmanager). @@ -1185,7 +1185,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when requests are taking a very long time in replication. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits). -- **How to detect:** Calculate this using the `requests_slow_raft` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `requests_slow_raft` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `RequestsStuckInRaft` alerting rule from your cluster's [`api/v2/rules/` metrics endpoint](#alertmanager). @@ -1193,7 +1193,7 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule:** Send an alert when a cluster is getting close to the [open file descriptor limit]({% link {{ page.version.version }}/recommended-production-settings.md %}#file-descriptors-limit). -- **How to detect:** Calculate this using the `sys_fd_softlimit` metric in the node's `_status/vars` output. +- **How to detect:** Calculate this using the `sys_fd_softlimit` metric in the node's Prometheus endpoint output. - **Rule definition:** Use the `HighOpenFDCount` alerting rule from your cluster's [`api/v2/rules/` metrics endpoint](#alertmanager). diff --git a/src/current/v25.3/pause-job.md b/src/current/v25.3/pause-job.md index 915e955bd52..a3f9eaf1d5a 100644 --- a/src/current/v25.3/pause-job.md +++ b/src/current/v25.3/pause-job.md @@ -63,7 +63,7 @@ You can monitor protected timestamps relating to particular CockroachDB jobs wit - `jobs.{job_type}.protected_age_sec` tracks the oldest protected timestamp record protecting `{job_type}` jobs. As this metric increases, garbage accumulation increases. Garbage collection will not progress on a table, database, or cluster if the protected timestamp record is present. - `jobs.{job_type}.protected_record_count` tracks the number of protected timestamp records held by `{job_type}` jobs. -For a full list of the available job types, access your cluster's [`/_status/vars`]({% link {{ page.version.version }}/prometheus-endpoint.md %}) endpoint. +For a full list of the available job types, access your cluster's [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). See the following pages for details on metrics: diff --git a/src/current/v25.3/row-level-ttl.md b/src/current/v25.3/row-level-ttl.md index edac9170806..ecef249544c 100644 --- a/src/current/v25.3/row-level-ttl.md +++ b/src/current/v25.3/row-level-ttl.md @@ -159,7 +159,7 @@ For more information about TTL-related cluster settings, see [View TTL-related c ## TTL metrics -The table below lists the metrics you can use to monitor the effectiveness of your TTL settings. These metrics are visible on the [Advanced Debug Page]({% link {{ page.version.version }}/ui-debug-pages.md %}), as well as at the `_status/vars` endpoint which can be scraped by [Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}). +The table below lists the metrics you can use to monitor the effectiveness of your TTL settings. These metrics are visible on the [Advanced Debug Page]({% link {{ page.version.version }}/ui-debug-pages.md %}), as well as at the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) which can be scraped by [Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}). | Name | Description | Measurement | Type | |-------------------------------------------+------------------------------------------------------------------+----------------------+-----------| diff --git a/src/current/v25.3/work-with-virtual-clusters.md b/src/current/v25.3/work-with-virtual-clusters.md index 37df0ef3fbf..6adfd10f847 100644 --- a/src/current/v25.3/work-with-virtual-clusters.md +++ b/src/current/v25.3/work-with-virtual-clusters.md @@ -105,7 +105,7 @@ I230815 19:31:07.290757 922 sql/temporary_schema.go:554 ⋮ [T4,demo,n1] 148 fo When cluster virtualization is enabled, metrics are also scoped to a virtual cluster or to the system virtual cluster, and are labeled accordingly. All metrics are visible from the system virtual cluster, but metrics scoped to the system virtual cluster are not visible from a virtual cluster. Metrics related to SQL activity and jobs are visible only from a virtual cluster. -For example, in the output of the `_status/vars` HTTP endpoint on a cluster with a virtual cluster named `demo`, the metric `sql_txn_commit_count` is shown separately for the `demo` virtual cluster and the system virtual cluster: +For example, in the output of the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) on a cluster with a virtual cluster named `demo`, the metric `sql_txn_commit_count` is shown separately for the `demo` virtual cluster and the system virtual cluster: ~~~ none sql_txn_commit_count{tenant="system"} 0