Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions docs/admin/bootstrap-checks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,6 @@ run, but it is still a good idea to follow these instructions.
Docker. Consult the additional documentation on Docker :ref:`resource
constraints <resource_constraints>` for more information.

.. rubric:: Table of contents

.. contents::
:local:


System settings
===============
Expand Down
34 changes: 17 additions & 17 deletions docs/admin/circuit-breaker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ Think of the miniature breakers inside a household fuse box: if too many applian
trips and cuts power to prevent the wires from melting. The same principle applies in software, only the resource under pressure
is memory, CPU, file descriptors, or an external service.

In CrateDB, the critical resource is **RAM**. Queries run in parallel across many shards; a single
oversized aggregation or JOIN can allocate gigabytes in milliseconds. The breaker detects this and aborts the query with a
In CrateDB, the critical resource is **RAM**. Queries run in parallel across many shards; a single
oversize aggregation or JOIN can allocate gigabytes in milliseconds. The breaker detects this and aborts the query with a
``CircuitBreakingException`` instead of letting the JVM run out of heap and crash the node.

How Circuit Breakers Work in CrateDB
====================================
A query executes as an ordered set of operations. Before running each stage, CrateDB estimates the extra memory that step will need.
If the projected total would exceed the breaker limit, the system aborts the query and returns a ``CircuitBreakingException``.
If the projected total exceeds the breaker limit, the system aborts the query and returns a ``CircuitBreakingException``.
This pre-emptive trip prevents the JVM's garbage collector from reaching an unrecoverable out-of-memory state.

It is important to understand CrateDB doesn’t aspire to do a fully accurate memory accounting, but instead opts for a best-effort approach,
Expand All @@ -29,9 +29,9 @@ since a precise estimate is tricky to achieve.
Types of Circuit Breakers
=========================
There are six different Circuit Breaker types which are described in detail in the `cluster settings`_ documentation page: ``query``,
``request``, ``jobs_log``, ``operations_log``, ``total`` and ``accounting``, which was deprecated and will be removed soon. The ``total`` Circuit Breaker, also
known as ``parent``, accounts for all others, meaning that it controls the general use of memory, tripping an operation if a
combination of the circuit breakers threatens the cluster.
``request``, ``jobs_log``, ``operations_log``, ``total`` and ``accounting``, which was deprecated and will be removed soon. The ``total`` Circuit Breaker, also
known as ``parent``, accounts for all others, meaning that it controls the general use of memory, tripping an operation if a
combination of the circuit breakers threatens the cluster.

Monitoring & Observability
==========================
Expand All @@ -45,29 +45,29 @@ deployment to collecting metrics and displaying them on a Grafana dashboard.
Exception Handling
==================
.. code-block:: console

CircuitBreakingException[Allocating 2mb for 'query: mergeOnHandler' failed, breaker would use 976.4mb in total. Limit is 972.7mb. Either increase memory and limit, change the query or reduce concurrent query load]

* **Understanding the error**
* **Understanding the error**

The memory estimate for **mergeOnHandler** exceeded the ``indices.breaker.query.limit``, so the query was aborted and the
exception returned.


* **Immediate actions**
* **Immediate actions**

* **Optimize the query** - see :ref:`Query Optimization 101 <performance-optimization>` for detailed guidance.
* **Identify memory-hungry queries** - run:

.. code-block:: psql

SELECT js.id,
stmt,
username,
sum(used_bytes) sum_bytes
FROM sys.operations op
JOIN sys.jobs js ON op.job_id = js.id
GROUP BY js.id, stmt, username
stmt,
username,
sum(used_bytes) sum_bytes
FROM sys.operations op
JOIN sys.jobs js ON op.job_id = js.id
GROUP BY js.id, stmt, username
ORDER BY sum_bytes DESC;


Expand Down
5 changes: 0 additions & 5 deletions docs/admin/clustering/logical-replication-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,6 @@ As a publish/subscribe model, it allows a publishing cluster to make certain
tables available for subscription. Subscribing clusters pull changes from a
publication and replay them on their side.

.. rubric:: Table of contents

.. contents::
:local:

.. _requirements:

Requirements
Expand Down
27 changes: 11 additions & 16 deletions docs/admin/clustering/multi-node-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,6 @@ process :ref:`manually <manual-bootstrapping>`.
This guide shows you how to bootstrap (set up) a multi-node CrateDB cluster
using different methods.

.. rubric:: Table of contents

.. contents::
:local:


.. _cluster-bootstrapping:

Expand Down Expand Up @@ -97,11 +92,11 @@ instructions.

sh$ tar -xzf crate-*.tar.gz

2. It is common to configure the :ref:`metadata gateway <metadata-gateway>` so
2. It is common to configure the :ref:`metadata gateway <metadata-gateway>` so
that the cluster waits for all data nodes to be online before starting the
recovery of the shards. In this case let's set
`gateway.expected_data_nodes`_ to **3** and
`gateway.recover_after_data_nodes`_ also to **3**. You can specify these
recovery of the shards. In this case, let's set
`gateway.expected_data_nodes`_ to **3** and
`gateway.recover_after_data_nodes`_ also to **3**. You can specify these
settings in the `configuration`_ file of the unpacked directory.

.. NOTE::
Expand Down Expand Up @@ -321,8 +316,8 @@ network partition (also known as a `split-brain`_ scenario).

CrateDB (versions 4.x and above) will automatically determine the ideal `quorum
size`_, but if you are using CrateDB versions 3.x and below, you must manually set
the quorum size using the `discovery.zen.minimum_master_nodes`_ setting and for
a three-node cluster, you must declare all nodes to be master-eligible.
the quorum size using the `discovery.zen.minimum_master_nodes`_ setting. For
a three-node cluster, you must declare all nodes to be master-eligible.

.. _metadata-gateway:

Expand All @@ -332,13 +327,13 @@ Metadata gateway
When running a multi-node cluster, you can configure the :ref:`metadata gateway <metadata-gateway>`
settings so that CrateDB delays recovery until a certain number of nodes is
available.
This is useful because if recovery is started when some nodes are down
CrateDB will proceed on the basis the nodes that are down may not be coming
back, and it will create new replicas and rebalance shards as necessary.
This is an expensive operation that, depending on the context, may be better
This is useful because if recovery is started when some nodes are down
CrateDB will proceed on the basis that the nodes that are down may not come
back, creating new replicas and rebalance shards as necessary.
This is an expensive operation that, depending on the context, may be better
avoided if the nodes are only down for a short period of time.
So, for instance, for a three-nodes cluster, you can decide to set
`gateway.expected_data_nodes`_ to **3**, and
`gateway.expected_data_nodes`_ to **3**, and
`gateway.recover_after_data_nodes`_ also to **3**.

You can specify both settings in your `configuration`_ file:
Expand Down
5 changes: 0 additions & 5 deletions docs/admin/clustering/multi-zone-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,6 @@ In some cases, it may be necessary to run a cluster across multiple data
centers or availability zones (*zones*, for short). This guide shows you how
to set up a multi-zone CrateDB cluster.

.. rubric:: Table of contents

.. contents::
:local:


.. _multi-zone-requirements:

Expand Down
5 changes: 0 additions & 5 deletions docs/admin/clustering/scale/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,6 @@ Together, Docker and Kubernetes are a fantastic way to deploy and scale CrateDB.

The official `CrateDB Docker image`_.

.. rubric:: Table of contents

.. contents::
:local:


.. _scaling-kube-kube:

Expand Down
5 changes: 0 additions & 5 deletions docs/admin/going-into-production.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ Going into production
Running CrateDB in different environments requires different approaches. This
document outlines the basics you need to consider when going into production.

.. rubric:: Table of contents

.. contents::
:local:


.. _prod-bootstrapping:

Expand Down
5 changes: 0 additions & 5 deletions docs/admin/sharding-partitioning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,6 @@
Sharding and Partitioning
#########################

.. rubric:: Table of contents

.. contents::
:local:


Introduction
============
Expand Down
8 changes: 0 additions & 8 deletions docs/admin/troubleshooting/crate-node.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,6 @@ Using this command, you can:
the event that you lose too many nodes to be able to form a quorum.
* Detach nodes from an old cluster so they can be moved to a new cluster.

.. rubric:: Table of contents

.. toctree::
:maxdepth: 1

.. contents::
:local:


.. _crate-node-repurpose:

Expand Down
5 changes: 0 additions & 5 deletions docs/admin/troubleshooting/jcmd/docker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,6 @@ how to solve it.
identical to non-containerized applications.


.. rubric:: Table of contents

.. contents::
:local:

Run ``jcmd`` inside container
=============================

Expand Down
5 changes: 0 additions & 5 deletions docs/admin/troubleshooting/system-tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,6 @@ analyze, identify the problem, and start mitigating it. While there is
:ref:`detailed information about all system tables <crate-reference:system-information>`,
this guide runs you through the most common situations.

.. rubric:: Table of contents

.. contents::
:local:


Step 1: Inspect health checks
=============================
Expand Down
5 changes: 0 additions & 5 deletions docs/admin/upgrade/full.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@
Full Restart Upgrade
====================

.. rubric:: Table of contents

.. contents::
:local:

Introduction
============

Expand Down
69 changes: 47 additions & 22 deletions docs/admin/upgrade/planning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,57 +8,82 @@
General Upgrade Guidelines
==========================

.. rubric:: Table of contents

.. contents::
:local:


Upgrade Planning
================
Before kicking off an upgrade, there is a set of guidelines to ensure the best outcome. Below you may find the fundamental steps to prepare for an upgrade.
Before kicking off an upgrade, consider the following steps to prepare for an
upgrade.

.. NOTE::

This is not an exhaustive list, so you should consider your organization's specific needs and incorporate any additional steps or considerations that are relevant to your environment.
This is not an exhaustive list, so you should consider your organization's
specific needs and incorporate any additional steps or considerations that
are relevant to your environment.

Acknowledge breaking changes
----------------------------

Review the :ref:`release notes <crate-reference:release_notes>` and documentation for the target version to understand any potential impact on existing functionality.
Ensure to review the intermediate versions' documentation also. For example, when upgrading from 4.8 to 5.3, besides reviewing 5.3 release notes, check for version 5.0, 5.1, and so on.
Review the :ref:`release notes <crate-reference:release_notes>` and documentation
for the target version to understand any potential impact on existing functionality.
Ensure to review the intermediate versions' documentation also. For example, when
upgrading from 4.8 to 5.3, besides reviewing 5.3 release notes, check for version
5.0, 5.1, and so on.

Set up a test environment
-------------------------

Create a test environment that closely resembles your production environment, including the same CrateDB version, hardware, and network configuration. Populate the test environment with representative data and perform thorough testing to ensure compatibility and functionality, including functional and non-functional testing.
Create a test environment that closely resembles your production environment,
including the same CrateDB version, hardware, and network configuration.
Populate the test environment with representative data and perform thorough
testing to ensure compatibility and functionality, including functional and
non-functional testing.


Back up and plan recovery
-------------------------

Perform a cluster-wide backup of your production CrateDB and ensure you have a reliable recovery mechanism in place. Read more in the :ref:`snapshots <crate-reference:snapshot-restore>` documentation.
Perform a cluster-wide backup of your production CrateDB and ensure you have a
reliable recovery mechanism in place. Read more in the
:ref:`snapshots <crate-reference:snapshot-restore>` documentation.

For the newly written records, you should consider using a mechanism to queue them (e.g. message queue), so these messages can be replayed if needed.
For the newly written records, you should consider using a mechanism to queue
them (e.g. message queue), so these messages can be replayed if needed.

.. WARNING::

Before starting the upgrade process, ensure no backup processes are triggered, so disable any scheduled backup.

Before starting the upgrade, ensure no backup jobs are started by disabling
any scheduled backup.

Define a rollback plan
----------------------

The rollback plan may vary depending on the specific infrastructure and upgrade process in use. It is also essential to adapt this outline to your organization's specific needs and incorporate any additional steps or considerations that are relevant to your environment. A set of steps to serve as an example is listed below:
The rollback plan may vary depending on the specific infrastructure and upgrade
process in use. It is also essential to adapt this outline to your organization's
specific needs and incorporate any additional steps or considerations that are
relevant to your environment. A set of steps to serve as an example is listed
below:

* **Identify the issue:** Determine the specific problem that occurred during the upgrade. This could be related to data corruption, performance degradation, application errors, or any other issue that affects the normal functioning of CrateDB. Identify if there are any potential risks to the system's stability, security, or performance.
* **Identify the issue:** Determine the specific problem that occurred during
the upgrade. This could be related to data corruption, performance degradation,
application errors, or any other issue that affects the normal functioning of
CrateDB. Identify if there are any potential risks to the system's stability,
security, or performance.

* **Communicate the situation:** Notify all relevant stakeholders, including individuals involved in the upgrade process. Clearly explain the problem and the decision to initiate a rollback.
* **Communicate the situation:** Notify all relevant stakeholders, including
individuals involved in the upgrade process. Clearly explain the problem and the
decision to initiate a rollback.

* **Execute the rollback:** The rollback process may differ depending on the version jump. If upgrading from one patch release to another and there is no data corruption, only a performance issue, a simple in-place downgrade to the previous patch release is sufficient. For major/minor version jumps or in case of data corruption, restoring from a backup is required.
* **Execute the rollback:** The rollback process may differ depending on the
version jump. If upgrading from one patch release to another and there is no data
corruption, only a performance issue, a simple in-place downgrade to the previous
patch release is sufficient. For major/minor version jumps or in case of data
corruption, restoring from a backup is required.

* **Perform data validation:** Conduct a thorough data validation process to ensure the integrity of the CrateDB Cluster. Verify that all critical data is intact and accurate. If needed, replay the messages from the message queue.
* **Perform data validation:** Conduct a thorough data validation process to
ensure the integrity of the CrateDB Cluster. Verify that all critical data is
intact and accurate. If needed, replay the messages from the message queue.

* **Share insights:** Communicate any findings and the defined plan to retry the upgrade.
* **Share insights:** Communicate any findings and the defined plan to retry the
upgrade.



Expand All @@ -67,6 +92,6 @@ Upgrade Execution

Choose the upgrade strategy below that works best for your scenario.

- :ref:`rolling_upgrade`
- :ref:`rolling_upgrade`

- :ref:`full_restart_upgrade`
6 changes: 0 additions & 6 deletions docs/admin/upgrade/rolling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,6 @@
Rolling Upgrade
===============

.. rubric:: Table of contents

.. contents::
:local:

Introduction
============

Expand Down Expand Up @@ -286,4 +281,3 @@ again that have been disabled in the first step:

cr> SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'all';
SET OK, 1 row affected (... sec)

2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@
r"https://www.softwareag.com/.*",
# 403 Client Error: Forbidden for url
r"https://dzone.com/.*",
# 504 Client Error: Gateway Timeout for url
r"https://web.archive.org/.*",
]

linkcheck_anchors_ignore_for_url += [
Expand Down
Loading