Skip to content

DOC-13170 Product Change- PR #143536 - metric: add /metrics endpoint with static labels #19823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/current/_includes/v25.3/cdc/metrics-labels.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#metrics). Metrics label information is sent with time-series metrics to `http://{host}:{http-port}/_status/vars`, viewable via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). An aggregated metric of all changefeeds is also measured.
To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#metrics). Metrics label information is sent with time-series metrics to the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). An aggregated metric of all changefeeds is also measured.

It is necessary to consider the following when applying metrics labels to changefeeds:

- The `server.child_metrics.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) must be set to `true` before using the `metrics_label` option. `server.child_metrics.enabled` is enabled by default in {{ site.data.products.standard }} and {{ site.data.products.basic }}.
- Metrics label information is sent to the `_status/vars` endpoint, but will **not** show up in [`debug.zip`]({% link {{ page.version.version }}/cockroach-debug-zip.md %}) or the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}).
- Metrics label information is sent to the Prometheus endpoint, but will **not** show up in [`debug.zip`]({% link {{ page.version.version }}/cockroach-debug-zip.md %}) or the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}).
- Introducing labels to isolate a changefeed's metrics can increase cardinality significantly. There is a limit of 1024 unique labels in place to prevent cardinality explosion. That is, when labels are applied to high-cardinality data (data with a higher number of unique values), each changefeed with a label then results in more metrics data to multiply together, which will grow over time. This will have an impact on performance as the metric-series data per changefeed quickly populates against its label.
- The maximum length of a metrics label is 128 bytes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
As explained in more detail [in our monitoring documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint), each CockroachDB node exports a wide variety of metrics at `http://<host>:<http-port>/_status/vars` in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes:
As explained in more detail [in our monitoring documentation]({% link {{ page.version.version }}/prometheus-endpoint.md %}), each CockroachDB node exports a wide variety of metrics in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes:

Metric | Definition
-------|-----------
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{{site.data.alerts.callout_info}}
If the cluster becomes unavailable, the DB Console and Cluster API will also become unavailable. You can continue to monitor the cluster via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) and [logs]({% link {{ page.version.version }}/logging-overview.md %}).
If the cluster becomes unavailable, the DB Console and Cluster API will also become unavailable. You can continue to monitor the cluster via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) and [logs]({% link {{ page.version.version }}/logging-overview.md %}).
{{site.data.alerts.end}}
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,12 @@
"/${VERSION}/enable-node-map.html"
]
},
{
"title": "Prometheus Endpoint",
"urls": [
"/${VERSION}/prometheus-endpoint.html"
]
},
{
"title": "Use Prometheus and Alertmanager",
"urls": [
Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/api-support-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ A *mixed* API includes both stable and unstable features.
[cockroach-commands]: {% link {{ page.version.version }}/cockroach-commands.md %}
[cockroach-sql]: {% link {{ page.version.version }}/cockroach-sql.md %}
[health-endpoints]: {% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-endpoints
[prometheus-endpoint]: {% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint
[prometheus-endpoint]: {% link {{ page.version.version }}/prometheus-endpoint.md %}
[cluster-api]: {% link {{ page.version.version }}/cluster-api.md %}
[db-console]: {% link {{ page.version.version }}/ui-overview.md %}
[logging-overview]: {% link {{ page.version.version }}/logging-overview.md %}
Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/backup-and-restore-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You can access the [Prometheus Endpoint](#prometheus-endpoint) to track and aler

## Prometheus endpoint

You can access the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) (`http://<host>:<http-port>/_status/vars`) for backup and restore metrics.
You can access the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) for backup and restore metrics.

Refer to the [Monitor CockroachDB with Prometheus]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) tutorial for guidance on installing and setting up Prometheus and Alertmanager to track metrics.

Expand Down
4 changes: 2 additions & 2 deletions src/current/v25.3/datadog.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Uncomment the following line in `cockroachdb.d/conf.yaml`:
- prometheus_url: http://localhost:8080/_status/vars
~~~

This enables metrics collection via our [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint).
This enables metrics collection via our [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}).

### Configure security certificates

Expand Down Expand Up @@ -175,7 +175,7 @@ The timeseries graph at the top of the page indicates the configured metric and

If you rely on external tools such as Datadog for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage).

When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Datadog based on the data it is collecting from your cluster's Prometheus endpoint.
When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Datadog based on the data it is collecting from your cluster's Prometheus endpoint.

## Known limitations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ When using [Third-Party Monitoring Integrations]({% link {{ page.version.version

## CockroachDB’s Timeseries Database

CockroachDB stores metrics data in its own internal timeseries database (TSDB). While CockroachDB exposes [point-in-time metrics]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) via its [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint), `/_status/vars`, it also periodically scrapes and writes this information to its own timeseries database storage. This data is scraped every 10 seconds and stored at that resolution. After a period of time determined by the [`timeseries.storage.resolution_10s.ttl` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-10s-ttl), that 10 second resolution data is compacted into a 30 minute resolution. After another period of time determined by the [`timeseries.storage.resolution_30m.ttl cluster setting`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-30m-ttl), the 30 minute resolution data is deleted.
CockroachDB stores metrics data in its own internal timeseries database (TSDB). While CockroachDB exposes [point-in-time metrics]({% link {{ page.version.version }}/monitor-cockroachdb-with-prometheus.md %}) via its [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}), it also periodically scrapes and writes this information to its own timeseries database storage. This data is scraped every 10 seconds and stored at that resolution. After a period of time determined by the [`timeseries.storage.resolution_10s.ttl` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-10s-ttl), that 10 second resolution data is compacted into a 30 minute resolution. After another period of time determined by the [`timeseries.storage.resolution_30m.ttl cluster setting`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-timeseries-storage-resolution-30m-ttl), the 30 minute resolution data is deleted.

The data in TSDB is used to populate metrics charts in DB Console. TSDB provides its own implementations to compute functions, such as rate-of-change, maximum, sum, etc. It also provides its own implementation to perform downsampling.

Expand All @@ -33,7 +33,7 @@ Datadog scrapes every 60s | 0 | - | - | - | - | - | 0

Since Cockroach Labs does not own the third-party systems, we can not be expected to have intimate knowledge about how each system’s different query language and timeseries database works.

The [metrics export feature]({% link cockroachcloud/export-metrics.md %}) scrapes the `/_status/vars` endpoint every 30 seconds, and forwards the data along to the third-party system. The metrics export does no intermediate aggregation, downsampling, or modification of the timeseries values at any point. The raw metrics export data is at a 30-second resolution, but how that data is processed once received by the third party system is unknown to us.
The [metrics export feature]({% link cockroachcloud/export-metrics.md %}) scrapes the Prometheus endpoint every 30 seconds, and forwards the data along to the third-party system. The metrics export does no intermediate aggregation, downsampling, or modification of the timeseries values at any point. The raw metrics export data is at a 30-second resolution, but how that data is processed once received by the third party system is unknown to us.

It is within our scope to understand and support our own timeseries database. If you have problems receiving metrics in your third-party system, [our support]({% link {{ page.version.version }}/support-resources.md %}) can help troubleshoot those problems. However, once the data is ingested into the third-party system, please contact your representative at that third-party company to support issues found on those systems. For example, assuming the raw metric data has been ingested as expected, Cockroach Labs does not support writing queries in third-party systems, such as using Datadog's Metrics Explorer or Datadog Query Language (DQL).

Expand Down
4 changes: 2 additions & 2 deletions src/current/v25.3/kibana.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ toc: true
docs_area: manage
---

[Kibana](https://www.elastic.co/kibana/) is a platform that visualizes data on the [Elastic Stack](https://www.elastic.co/elastic-stack/). This page shows how to use the [CockroachDB module for Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-cockroachdb.html) to collect metrics exposed by your CockroachDB {{ site.data.products.core }} cluster's [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) in Elasticsearch and how to visualize those metrics with Kibana.
[Kibana](https://www.elastic.co/kibana/) is a platform that visualizes data on the [Elastic Stack](https://www.elastic.co/elastic-stack/). This page shows how to use the [CockroachDB module for Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-cockroachdb.html) to collect metrics exposed by your CockroachDB {{ site.data.products.core }} cluster's [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}) in Elasticsearch and how to visualize those metrics with Kibana.

{{site.data.alerts.callout_success}}
To export metrics from a CockroachDB {{ site.data.products.cloud }} cluster, refer to [Export Metrics From a CockroachDB {{ site.data.products.dedicated }} Cluster]({% link cockroachcloud/export-metrics.md %}) instead of this page.
Expand Down Expand Up @@ -115,7 +115,7 @@ Click **Refresh**. The query metrics will appear on the dashboard:

If you rely on external tools such as Kibana for storing and visualizing your cluster's time-series metrics, Cockroach Labs recommends that you [disable the DB Console's storage of time-series metrics]({% link {{ page.version.version }}/operational-faqs.md %}#disable-time-series-storage).

When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Kibana based on the data it is collecting from your cluster's Prometheus endpoint.
When storage of time-series metrics is disabled, the cluster continues to expose its metrics via the [Prometheus endpoint]({% link {{ page.version.version }}/prometheus-endpoint.md %}). The DB Console stops storing new time-series cluster metrics and eventually deletes historical data. The Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. You can create queries, visualizations, and alerts in Kibana based on the data it is collecting from your cluster's Prometheus endpoint.

## See also

Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/load-based-splitting.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ indicates that CockroachDB wants to [split the range]({% link {{ page.version.ve

Usually this log message can be ignored, unless it repeatedly shows up, which can indicate there is a load imbalance problem in the cluster. If there is a load imbalance problem, it could be because a [hot range]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}) cannot be split (because it's really a [hot key]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}#range-report)).

You can see how often a split key cannot be found over time by looking at the following [time-series metric]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint):
You can see how often a split key cannot be found over time by looking at the following [time-series metric]({% link {{ page.version.version }}/prometheus-endpoint.md %}):

- `kv.loadsplitter.nosplitkey`

Expand Down
2 changes: 1 addition & 1 deletion src/current/v25.3/manage-a-backup-schedule.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Further guidance on connecting to Amazon S3, Google Cloud Storage, Azure Storage

## Set up monitoring for the backup schedule

We recommend that you [monitor your backup schedule with Prometheus]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint), and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time—at which point, you can inspect schedules by running [`SHOW SCHEDULES`]({% link {{ page.version.version }}/show-schedules.md %}).
We recommend that you [monitor your backup schedule with Prometheus]({% link {{ page.version.version }}/prometheus-endpoint.md %}), and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time—at which point, you can inspect schedules by running [`SHOW SCHEDULES`]({% link {{ page.version.version }}/show-schedules.md %}).

Metrics for scheduled backups fall into two categories:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ As part of normal operation, CockroachDB continuously records [metrics]({% link

- [Export Metrics From a CockroachDB Standard Cluster]({% link cockroachcloud/export-metrics.md %})
- [Export Metrics From a CockroachDB Advanced Cluster]({% link cockroachcloud/export-metrics-advanced.md %})
- [Prometheus endpoint for a self-hosted cluster]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint)
- [Prometheus endpoint for a self-hosted cluster]({% link {{ page.version.version }}/prometheus-endpoint.md %})

The following metrics related to contention are available across all deployment types:

Expand Down
Loading
Loading