Skip to content

Commit 7c108eb

Browse files
committed
Create initial structure
1 parent 9a1de54 commit 7c108eb

21 files changed

+3055
-0
lines changed

doc/book/admin/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,4 @@ This chapter includes the following sections:
4141
os_notes
4242
bug_reports
4343
troubleshoot
44+
monitoring

doc/book/admin/monitoring.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
.. _monitoring:
2+
3+
Monitoring
4+
==========
5+
6+
Monitoring is the process of measuring and tracking Tarantool performance based on metrics.
7+
The metrics are typically monitored in real time, which allows you to identify or predict issues.
8+
9+
.. toctree::
10+
:maxdepth: 1
11+
:numbered: 0
12+
13+
monitoring/getting_started
14+
monitoring/plugins
15+
monitoring/grafana_dashboard
16+
monitoring/alerting

doc/book/admin/monitoring/alerting.rst

Lines changed: 416 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
.. _monitoring-getting_started:
2+
3+
Getting started with monitoring
4+
===============================
5+
.. IMPORTANT::
6+
7+
TODO: Write new content
8+
9+
.. NOTE::
10+
11+
If you use Tarantool version below `2.11.1 <https://github.com/tarantool/tarantool/releases/tag/2.11.1>`__,
12+
it is necessary to install the latest version of `metrics <https://github.com/tarantool/metrics>`__ first.
13+
14+
15+
1. Config (``config.yaml``):
16+
17+
.. literalinclude:: /code_snippets/snippets/config/instances.enabled/metrics/config.yaml
18+
:end-at: expose_prometheus_metrics
19+
:language: yaml
20+
:dedent:
21+
22+
2. Serve metrics using a role (``expose_prometheus_metrics.lua``):
23+
24+
.. literalinclude:: /code_snippets/snippets/config/instances.enabled/metrics/examples/expose_prometheus_metrics.lua
25+
:language: lua
26+
:dedent:
27+
28+
3. Prometheus scrape config:
29+
30+
.. literalinclude:: /code_snippets/snippets/config/instances.enabled/metrics/prometheus.yml
31+
:language: yaml
32+
:dedent:
33+
34+
Example on GitHub: `metrics <https://github.com/tarantool/doc/tree/latest/doc/code_snippets/snippets/config/instances.enabled/metrics>`_
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
.. _monitoring-grafana_dashboard-page:
2+
3+
===============================================================================
4+
Grafana dashboard
5+
===============================================================================
6+
7+
.. IMPORTANT::
8+
9+
TODO:
10+
11+
- Update to 3.x config
12+
- Use built-in 3.x roles for **crud**
13+
- Use built-in 3.x roles for **expirationd**
14+
- Remove Cartridge content
15+
- Remove TDG content
16+
17+
Tarantool Grafana dashboard is available as part of
18+
`Grafana Official & community built dashboards <https://grafana.com/grafana/dashboards>`_.
19+
There's a version
20+
`for Prometheus data source <https://grafana.com/grafana/dashboards/13054>`_
21+
and one `for InfluxDB data source <https://grafana.com/grafana/dashboards/12567>`_.
22+
There are also separate dashboards for TDG applications:
23+
`for Prometheus data source <https://grafana.com/grafana/dashboards/16406>`_
24+
and `for InfluxDB data source <https://grafana.com/grafana/dashboards/16405>`_.
25+
Tarantool Grafana dashboard is a ready for import template with basic memory,
26+
space operations, and HTTP load panels, based on default `metrics <https://github.com/tarantool/metrics>`_
27+
package functionality.
28+
29+
Dashboard requires using ``metrics`` **0.15.0** or newer for complete experience;
30+
``'alias'`` :ref:`global label <metrics-api_reference-labels>` must be set on each instance
31+
to properly display panels (e.g. provided with ``cartridge.roles.metrics`` role).
32+
33+
To support `CRUD <https://github.com/tarantool/crud>`_ statistics, install ``CRUD``
34+
**0.11.1** or newer. Call ``crud.cfg`` on router to enable CRUD statistics collect
35+
with latency quantiles.
36+
37+
.. code-block:: lua
38+
39+
crud.cfg{
40+
stats = true,
41+
stats_driver='metrics',
42+
stats_quantiles=true
43+
}
44+
45+
To support `expirationd <https://github.com/tarantool/expirationd>`_ statistics,
46+
install ``expirationd`` **1.2.0** or newer. Call ``expirationd.cfg`` on instance
47+
to enable statistics export.
48+
49+
.. code-block:: lua
50+
51+
expirationd.cfg{metrics = true}
52+
53+
.. image:: images/Prometheus_dashboard_1.png
54+
:width: 30%
55+
56+
.. image:: images/Prometheus_dashboard_2.png
57+
:width: 30%
58+
59+
.. image:: images/Prometheus_dashboard_3.png
60+
:width: 30%
61+
62+
.. _monitoring-grafana_dashboard-monitoring_stack:
63+
64+
-------------------------------------------------------------------------------
65+
Prepare a monitoring stack
66+
-------------------------------------------------------------------------------
67+
68+
Since there are Prometheus and InfluxDB data source Grafana dashboards,
69+
you can use
70+
71+
- `Telegraf <https://www.influxdata.com/time-series-platform/telegraf/>`_
72+
as a server agent for collecting metrics, `InfluxDB <https://www.influxdata.com/>`_
73+
as a time series database for storing metrics, and `Grafana <https://grafana.com/>`_
74+
as a visualization platform; or
75+
- `Prometheus <https://prometheus.io/>`_ as both a server agent for collecting metrics
76+
and a time series database for storing metrics, and `Grafana <https://grafana.com/>`_
77+
as a visualization platform.
78+
79+
For issues concerning setting up Prometheus, Telegraf, InfluxDB, or Grafana instances
80+
please refer to the corresponding project's documentation.
81+
82+
.. _monitoring-grafana_dashboard-collect_metrics:
83+
84+
-------------------------------------------------------------------------------
85+
Collect metrics with server agents
86+
-------------------------------------------------------------------------------
87+
88+
To collect metrics for Prometheus, first set up metrics output with
89+
``prometheus`` format. You can use :ref:`cartridge.roles.metrics <monitoring-getting_started-cartridge_role>`
90+
configuration or set up the :ref:`Prometheus output plugin <metrics-plugins-available>`
91+
manually. To start collecting metrics,
92+
`add a job <https://prometheus.io/docs/prometheus/latest/getting_started/#configure-prometheus-to-monitor-the-sample-targets>`_
93+
to Prometheus configuration with each Tarantool instance URI as a target and
94+
metrics path as it was configured on Tarantool instances:
95+
96+
.. code-block:: yaml
97+
98+
scrape_configs:
99+
- job_name: tarantool
100+
static_configs:
101+
- targets:
102+
- "example_project:8081"
103+
- "example_project:8082"
104+
- "example_project:8083"
105+
metrics_path: "/metrics/prometheus"
106+
107+
108+
To collect metrics for InfluxDB, use the Telegraf agent.
109+
First off, configure Tarantool metrics output in ``json`` format
110+
with :ref:`cartridge.roles.metrics <monitoring-getting_started-cartridge_role>`
111+
configuration or corresponding :ref:`JSON output plugin <metrics-plugins-available>`.
112+
To start collecting metrics, add `http input <https://github.com/influxdata/telegraf/blob/release-1.17/plugins/inputs/http/README.md>`_
113+
to Telegraf configuration including each Tarantool instance metrics URL:
114+
115+
.. code-block:: toml
116+
117+
[[inputs.http]]
118+
urls = [
119+
"http://example_project:8081/metrics/json",
120+
"http://example_project:8082/metrics/json",
121+
"http://example_project:8083/metrics/json"
122+
]
123+
timeout = "30s"
124+
tag_keys = [
125+
"metric_name",
126+
"label_pairs_alias",
127+
"label_pairs_quantile",
128+
"label_pairs_path",
129+
"label_pairs_method",
130+
"label_pairs_status",
131+
"label_pairs_operation",
132+
"label_pairs_level",
133+
"label_pairs_id",
134+
"label_pairs_engine",
135+
"label_pairs_name",
136+
"label_pairs_index_name",
137+
"label_pairs_delta",
138+
"label_pairs_stream",
139+
"label_pairs_thread",
140+
"label_pairs_kind"
141+
]
142+
insecure_skip_verify = true
143+
interval = "10s"
144+
data_format = "json"
145+
name_prefix = "tarantool_"
146+
fieldpass = ["value"]
147+
148+
Be sure to include each label key as ``label_pairs_<key>`` so it will be
149+
extracted with plugin. For example, if you use :code:`{ state = 'ready' }` labels
150+
somewhere in metric collectors, add ``label_pairs_state`` tag key.
151+
152+
For TDG dashboard, please use
153+
154+
.. code-block:: toml
155+
156+
[[inputs.http]]
157+
urls = [
158+
"http://example_tdg_project:8081/metrics/json",
159+
"http://example_tdg_project:8082/metrics/json",
160+
"http://example_tdg_project:8083/metrics/json"
161+
]
162+
timeout = "30s"
163+
tag_keys = [
164+
"metric_name",
165+
"label_pairs_alias",
166+
"label_pairs_quantile",
167+
"label_pairs_path",
168+
"label_pairs_method",
169+
"label_pairs_status",
170+
"label_pairs_operation",
171+
"label_pairs_level",
172+
"label_pairs_id",
173+
"label_pairs_engine",
174+
"label_pairs_name",
175+
"label_pairs_index_name",
176+
"label_pairs_delta",
177+
"label_pairs_stream",
178+
"label_pairs_thread",
179+
"label_pairs_type",
180+
"label_pairs_connector_name",
181+
"label_pairs_broker_name",
182+
"label_pairs_topic",
183+
"label_pairs_request",
184+
"label_pairs_kind",
185+
"label_pairs_thread_name",
186+
"label_pairs_type_name",
187+
"label_pairs_operation_name",
188+
"label_pairs_schema",
189+
"label_pairs_entity",
190+
"label_pairs_status_code"
191+
]
192+
insecure_skip_verify = true
193+
interval = "10s"
194+
data_format = "json"
195+
name_prefix = "tarantool_"
196+
fieldpass = ["value"]
197+
198+
If you connect Telegraf instance to InfluxDB storage, metrics will be stored
199+
with ``"<name_prefix>http"`` measurement (``"tarantool_http"`` in our example).
200+
201+
.. _monitoring-grafana_dashboard-import:
202+
203+
-------------------------------------------------------------------------------
204+
Import the dashboard
205+
-------------------------------------------------------------------------------
206+
Open Grafana import menu.
207+
208+
.. image:: images/grafana_import.png
209+
:align: left
210+
211+
To import a specific dashboard, choose one of the following options:
212+
213+
- paste the dashboard id (``12567`` for InfluxDB dashboard, ``13054`` for Prometheus dashboard,
214+
``16405`` for InfluxDB TDG dashboard, ``16406`` for Prometheus TDG dashboard), or
215+
- paste a link to the dashboard (
216+
https://grafana.com/grafana/dashboards/12567 for InfluxDB dashboard,
217+
https://grafana.com/grafana/dashboards/13054 for Prometheus dashboard,
218+
https://grafana.com/grafana/dashboards/16405 for InfluxDB TDG dashboard,
219+
https://grafana.com/grafana/dashboards/16406 for Prometheus TDG dashboard), or
220+
- paste the dashboard JSON file contents, or
221+
- upload the dashboard JSON file.
222+
223+
Set dashboard name, folder and uid (if needed).
224+
225+
.. image:: images/grafana_import_setup.png
226+
:align: left
227+
228+
You can choose datasource and datasource variables after import.
229+
230+
.. image:: images/grafana_variables_setup.png
231+
:align: left
232+
233+
.. _monitoring-grafana_dashboard-troubleshooting:
234+
235+
-------------------------------------------------------------------------------
236+
Troubleshooting
237+
-------------------------------------------------------------------------------
238+
239+
If there are no data on the graphs, make sure that you picked datasource and job/measurement correctly.
240+
241+
If there are no data on the graphs, make sure that you have ``info`` group of Tarantool metrics
242+
(in particular, ``tnt_info_uptime``).
243+
244+
If some Prometheus graphs show no data because of ``parse error: missing unit character in duration``,
245+
ensure that you use Grafana 7.2 or newer.
246+
247+
If some Prometheus graphs display ``parse error: bad duration syntax "1m0"`` or similar error, you need
248+
to update your Prometheus version. See
249+
`grafana/grafana#44542 <https://github.com/grafana/grafana/issues/44542>`_ for more details.
Loading
Loading
Loading
Loading
Loading
Loading
Loading
13.8 KB
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)