Skip to content

Commit fada90d

Browse files
Add documentation
Adds docs/content/guides/tablespaces Issue: [sc-17759] Co-authored-by: Tony Landreth <[email protected]>
1 parent 13e5315 commit fada90d

File tree

1 file changed

+307
-0
lines changed

1 file changed

+307
-0
lines changed

docs/content/guides/tablespaces.md

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
---
2+
title: "Tablespaces in PGO"
3+
date:
4+
draft: false
5+
weight: 160
6+
---
7+
8+
{{% notice warning %}}
9+
PGO tablespaces is currently in `Alpha` and may interfere with other features.
10+
(See below for more details.)
11+
{{% /notice %}}
12+
13+
A [Tablespace](https://www.postgresql.org/docs/current/manage-ag-tablespaces.html)
14+
is a Postgres feature that is used to store data on a different volume than the
15+
primary data directory. While most workloads do not require tablespaces, they can
16+
be helpful for larger data sets or utilizing particular hardware to optimize
17+
performance on a particular Postgres object (a table, index, etc.). Some examples
18+
of use cases for tablespaces include:
19+
20+
- Partitioning larger data sets across different volumes
21+
- Putting data onto archival systems
22+
- Utilizing faster/more performant hardware (or a storage class) for a particular database
23+
- Storing sensitive data on a volume that supports transparent data-encryption (TDE)
24+
25+
and others.
26+
27+
In order to use Postgres tablespaces properly in a highly-available,
28+
distributed system, there are several considerations to ensure proper operations:
29+
30+
- Each tablespace must have its own volume; this means that every tablespace for
31+
every replica in a system must have its own volume;
32+
- The filesystem map must be consistent across the cluster;
33+
- The backup & disaster recovery management system must be able to safely backup
34+
and restore data to tablespaces.
35+
36+
Additionally, a tablespace is a critical piece of a Postgres instance: if
37+
Postgres expects a tablespace to exist and the tablespace volume is unavailable,
38+
this could trigger a downtime scenario.
39+
40+
While there are certain challenges with creating a Postgres cluster with
41+
high-availability along with tablespaces in a Kubernetes-based environment, the
42+
Postgres Operator adds many conveniences to make it easier to use tablespaces.
43+
44+
## Enabling TablespaceVolumes in PGO v5
45+
46+
In PGO v5, tablespace support is currently in `Alpha`. If you want to use this
47+
experimental feature, you will need to enable the feature via the PGO `TablespaceVolumes`
48+
[feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/).
49+
50+
PGO feature gates are enabled by setting the `PGO_FEATURE_GATES` environment
51+
variable on the PGO Deployment. To enable tablespaces, you would want to set
52+
53+
```
54+
PGO_FEATURE_GATES="TablespaveVolumes=true"
55+
```
56+
57+
Please note that it is possible to enable more than one feature at a time as
58+
this variable accepts a comma delimited list. For example, to enable multiple features,
59+
you would set `PGO_FEATURE_GATES` like so:
60+
61+
```
62+
PGO_FEATURE_GATES="FeatureName=true,FeatureName2=true,FeatureName3=true..."
63+
```
64+
65+
## Using TablespaceVolumes in PGO v5
66+
67+
Once you have enabled `TablespaceVolumes` on your PGO deployment, you can add volumes to
68+
a new or existing cluster by adding volumes to the `spec.instances.tablespaceVolumes` field.
69+
70+
A `TablespaceVolume` object has two fields: a name (which is required and used to set the path)
71+
and a `dataVolumeClaimSpec`, which describes the storage that your Postgres instance will use
72+
for this volume. This field behaves identically to the `dataVolumeClaimSpec` in the `instances`
73+
list. For example, you could use the following to create a `postgrescluster`:
74+
75+
```yaml
76+
spec:
77+
instances:
78+
- name: instance1
79+
dataVolumeClaimSpec:
80+
accessModes:
81+
- "ReadWriteOnce"
82+
resources:
83+
requests:
84+
storage: 1Gi
85+
tablespaceVolumes:
86+
- name: user
87+
dataVolumeClaimSpec:
88+
accessModes:
89+
- "ReadWriteOnce"
90+
resources:
91+
requests:
92+
storage: 1Gi
93+
```
94+
95+
In this case, the `postgrescluster` will have 1Gi for the database volume and 1Gi for the tablespace
96+
volume, and both will be provisioned by PGO.
97+
98+
But if you were attempting to migrate data from one `postgrescluster` to another, you could re-use
99+
pre-existing volumes by passing in some label selector or the `volumeName` into the
100+
`tablespaceVolumes.dataVolumeClaimSpec` the same way you would pass that information into the
101+
`instances.dataVolumeClaimSpec` field:
102+
103+
```yaml
104+
spec:
105+
instances:
106+
- name: instance1
107+
dataVolumeClaimSpec:
108+
volumeName: pvc-1001c17d-c137-4f78-8505-be4b26136924 # A preexisting volume you want to reuse for PGDATA
109+
accessModes:
110+
- "ReadWriteOnce"
111+
resources:
112+
requests:
113+
storage: 1Gi
114+
tablespaceVolumes:
115+
- name: user
116+
dataVolumeClaimSpec:
117+
accessModes:
118+
- "ReadWriteOnce"
119+
resources:
120+
requests:
121+
storage: 1Gi
122+
volumeName: pvc-3fea1531-617a-4fff-9032-6487206ce644 # A preexisting volume you want to use for this tablespace
123+
```
124+
125+
Note: the `name` of the `tablespaceVolume` needs to be unique in the instance since
126+
that name becomes part of the mount path for that volume.
127+
128+
Once you request those `tablespaceVolumes`, PGO takes care of creating (or reusing) those volumes,
129+
including mounting them to the pod at a known path (`/tablespaces/NAME`) and adding them to the
130+
necessary containers.
131+
132+
### How to use Postgres Tablespaces in PGO v5
133+
134+
After PGO has mounted the volumes at the requested locations, the startup container makes sure
135+
that those locations have the appropriate owner and permissions. This behavior mimics the startup
136+
behavior behind the `PGDATA` directory, so that when you connect to your cluster, you should be
137+
able to start using those tablespaces.
138+
139+
In order to use those tablespaces in Postgres, you will first need to create the tablespace,
140+
including the location. As noted above, PGO mounts the requested volumes at `/tablespaces/NAME`.
141+
So if you request tablespaces with the names `books` and `authors`, the two volumes will be
142+
mounted at `/tablespaces/books` and `/tablespaces/authors`.
143+
144+
However, in order to make sure that the directory has the appropriate ownership so that Postgres
145+
can use it, we create a subdirectory called `data` in each volume.
146+
147+
To create a tablespace in Postgres, you will issue a command of the form
148+
149+
```
150+
CREATE TABLESPACE name LOCATION '/path/to/dir';
151+
```
152+
153+
So to create a tablespace called `books` in the new `books` volume, your command might look like
154+
155+
```
156+
CREATE TABLESPACE books LOCATION '/tablespaces/books/data';
157+
```
158+
159+
To break that path down: `tablespaces` is the mount point for all tablespace volumes; `books`
160+
is the name of the volume in the spec; and `data` is a directory created with the appropriate
161+
ownership by the startup script.
162+
163+
Once you have
164+
165+
* enabled the `TablespaceVolumes` feature gate,
166+
* added `tablespaceVolumes` to your cluster spec,
167+
* and created the tablespace in Postgres,
168+
169+
then you are ready to use tablespaces in your cluster. For example, if you wanted to create a
170+
table called `books` on the `books` tablespace, you could execute the following SQL:
171+
172+
```sql
173+
CREATE TABLE books (
174+
book_id VARCHAR2(20),
175+
title VARCHAR2(50)
176+
author_last_name VARCHAR2(30)
177+
)
178+
TABLESPACE books;
179+
```
180+
181+
## Considerations
182+
183+
### Only one pod per volume
184+
185+
As stated above, it is important to ensure that every tablespace has its own volume
186+
(i.e. its own [persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)).
187+
This is especially true for any replicas in a cluster: you don't want multiple Postgres instances
188+
writing to the same volume.
189+
190+
So if you have a single named volume in your spec (for either the main PGDATA directory or
191+
for tablespaces), you should not raise the `spec.instances.replicas` field above 1, because if you
192+
did, multiple pods would try to use the same volume.
193+
194+
### Too-long names?
195+
196+
Different Kubernetes objects have different limits about the length of their names. For example,
197+
services follow the DNS label conventions: 63 characters or less, lowercase, and alphanumeric with
198+
hyphens U+002D allowed in between.
199+
200+
Occasionally some PGO-managed objects will go over the limit set for that object type because of
201+
the user-set cluster or instance name.
202+
203+
We do not anticipate this being a problem with the `PersistentVolumeClaim` created for a tablespace.
204+
The name for a `PersistentVolumeClaim` created by PGO for a tablespace will potentially be long since
205+
the name is a combination of the cluster, the instance, the tablespace, and the `-tablespace` suffix.
206+
However, a `PersistentVolumeClaim` name can be up to 253 characters in length.
207+
208+
### Same tablespace volume names across replicas
209+
210+
We want to make sure that every pod has a consistent filesystem because Postgres expects
211+
the same path on each replica.
212+
213+
For instance, imagine on your primary Postgres, you add a tablespace with the location
214+
`/tablespaces/kafka/data`. If you have a replica attached to that primary, it will likewise
215+
try to add a tablespace at the location `/tablespaces/kafka/data`; and if that location doesn't
216+
exist on the replica's filesystem, Postgres will rightly complain.
217+
218+
Therefore, if you expand your `postgrescluster` with multiple instances, you will need to make
219+
sure that the multiple instances have `tablespaceVolumes` with the *same names*, like so:
220+
221+
```yaml
222+
spec:
223+
instances:
224+
- name: instance1
225+
dataVolumeClaimSpec:
226+
accessModes:
227+
- "ReadWriteOnce"
228+
resources:
229+
requests:
230+
storage: 1Gi
231+
tablespaceVolumes:
232+
- name: user
233+
dataVolumeClaimSpec:
234+
accessModes:
235+
- "ReadWriteOnce"
236+
resources:
237+
requests:
238+
storage: 1Gi
239+
- name: instance2
240+
dataVolumeClaimSpec:
241+
accessModes:
242+
- "ReadWriteOnce"
243+
resources:
244+
requests:
245+
storage: 1Gi
246+
tablespaceVolumes:
247+
- name: user
248+
dataVolumeClaimSpec:
249+
accessModes:
250+
- "ReadWriteOnce"
251+
resources:
252+
requests:
253+
storage: 1Gi
254+
```
255+
256+
### Tablespace backups
257+
258+
PGO uses `pgBackRest` as our backup solution, and `pgBackRest` is built to work with tablespaces
259+
natively. That is, `pgBackRest` should back up the entire database, including tablespaces, without
260+
any additional work on your part.
261+
262+
**Note**: `pgBackRest` does not itself use tablespaces, so all the backups will go to a single volume.
263+
One of the primary uses of tablespaces is to relieve disk pressure by separating the database among
264+
multiple volumes, but if you are running out of room on your `pgBackRest` persistent volume,
265+
tablespaces will not help, and you should first solve your backup space problem.
266+
267+
### Adding tablespaces to existing clusters
268+
269+
As with other changes made to the definition of a Postgres pod, adding `tablespaceVolumes` to an
270+
existing cluster may cause downtime. The act of mounting a new PVC to a Kubernetes Deployment
271+
causes the Pods in the deployment to restart.
272+
273+
### Restoring from a cluster with tablespaces
274+
275+
This functionality has not been fully tested. Enjoy!
276+
277+
### Removing tablespaces
278+
279+
Removing a tablespace is a nontrivial operation. Postgres does not provide a
280+
`DROP TABLESPACE .. CASCADE` command that would drop any associated objects with a tablespace.
281+
Additionally, the Postgres documentation covering the
282+
[`DROP TABLESPACE`](https://www.postgresql.org/docs/current/sql-droptablespace.html)
283+
command goes on to note:
284+
285+
> A tablespace can only be dropped by its owner or a superuser. The tablespace
286+
> must be empty of all database objects before it can be dropped. It is possible
287+
> that objects in other databases might still reside in the tablespace even if
288+
> no objects in the current database are using the tablespace. Also, if the
289+
> tablespace is listed in the temp_tablespaces setting of any active session,
290+
> the DROP might fail due to temporary files residing in the tablespace.
291+
292+
Because of this, and to avoid a situation where a Postgres cluster is left in an inconsistent
293+
state due to trying to remove a tablespace, PGO does not provide any means to remove tablespaces
294+
automatically. If you need to remove a tablespace from a Postgres deployment, we recommend
295+
following this procedure:
296+
297+
1. As a database administrator:
298+
1. Log into the primary instance of your cluster.
299+
1. Drop any objects (tables, indexes, etc) that reside within the tablespace you wish to delete.
300+
1. Delete this tablespace from the Postgres cluster using the `DROP TABLESPACE` command.
301+
1. As a Kubernetes user who can modify `postgrescluster` specs
302+
1. Remove the `tablespaceVolumes` entries for the tablespaces you wish to remove.
303+
304+
## More Information
305+
306+
For more information on how tablespaces work in Postgres please refer to the
307+
[Postgres manual](https://www.postgresql.org/docs/current/manage-ag-tablespaces.html).

0 commit comments

Comments
 (0)