Skip to content

feat: Support for fault-tolerant execution #779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/02-bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@
label: Affected Stackable version
description: Which version of the Stackable Operator do you see this bug in?

# - type: input

Check warning on line 15 in .github/ISSUE_TEMPLATE/02-bug_report.yml

View workflow job for this annotation

GitHub Actions / pre-commit

15:1 [comments-indentation] comment not indented like content
attributes:

Check failure on line 16 in .github/ISSUE_TEMPLATE/02-bug_report.yml

View workflow job for this annotation

GitHub Actions / pre-commit

16:5 [key-duplicates] duplication of key "attributes" in mapping
label: Affected Trino version
description: Which version of Trino do you see this bug in?
#
#

Check warning on line 19 in .github/ISSUE_TEMPLATE/02-bug_report.yml

View workflow job for this annotation

GitHub Actions / pre-commit

19:1 [comments-indentation] comment not indented like content
- type: textarea
attributes:
label: Current and expected behavior
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.

## [Unreleased]

### Added

- Support for fault-tolerant execution ([#779]).

[#779]: https://github.com/stackabletech/trino-operator/pull/779

## [25.7.0] - 2025-07-23

## [25.7.0-rc1] - 2025-07-18
Expand Down
537 changes: 537 additions & 0 deletions deploy/helm/trino-operator/crds/crds.yaml

Large diffs are not rendered by default.

108 changes: 108 additions & 0 deletions docs/modules/trino/examples/usage-guide/fault-tolerant-execution.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
name: trino-fault-tolerant
spec:
image:
productVersion: "476"
clusterConfig:
catalogLabelSelector:
matchLabels:
trino: trino-fault-tolerant
faultTolerantExecution:
task:
retryAttemptsPerTask: 4
retryInitialDelay: 10s
retryMaxDelay: 60s
retryDelayScaleFactor: 2.0
exchangeDeduplicationBufferSize: 64MB
exchangeManager:
encryptionEnabled: true
sinkBufferPoolMinSize: 20
sinkBuffersPerPartition: 4
sinkMaxFileSize: 2GB
sourceConcurrentReaders: 8
s3:
baseDirectories:
- "s3://trino-exchange-bucket/spooling"
connection:
reference: minio-connection
maxErrorRetries: 10
uploadPartSize: 10MB
coordinators:
roleGroups:
default:
replicas: 1
workers:
roleGroups:
default:
replicas: 3
---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Connection
metadata:
name: minio-connection
spec:
host: minio
port: 9000
accessStyle: Path
credentials:
secretClass: minio-credentials
tls:
verification:
server:
caCert:
secretClass: minio-tls-certificates
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
name: minio-tls-certificates
spec:
backend:
k8sSearch:
searchNamespace:
pod: {}
---
apiVersion: v1
kind: Secret
metadata:
name: minio-tls-certificates
labels:
secrets.stackable.tech/class: minio-tls-certificates
data:
ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQyVENDQXNHZ0F3SUJBZ0lVTmpxdUdZV3R5SjVhNnd5MjNIejJHUmNNbHdNd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2V6RUxNQWtHQTFVRUJoTUNSRVV4R3pBWkJnTlZCQWdNRWxOamFHeGxjM2RwWnkxSWIyeHpkR1ZwYmpFTwpNQXdHQTFVRUJ3d0ZWMlZrWld3eEtEQW1CZ05WQkFvTUgxTjBZV05yWVdKc1pTQlRhV2R1YVc1bklFRjFkR2h2CmNtbDBlU0JKYm1NeEZUQVRCZ05WQkFNTURITjBZV05yWVdKc1pTNWtaVEFnRncweU16QTJNVFl4TWpVeE1ESmEKR0E4eU1USXpNRFV5TXpFeU5URXdNbG93ZXpFTE1Ba0dBMVVFQmhNQ1JFVXhHekFaQmdOVkJBZ01FbE5qYUd4bApjM2RwWnkxSWIyeHpkR1ZwYmpFT01Bd0dBMVVFQnd3RlYyVmtaV3d4S0RBbUJnTlZCQW9NSDFOMFlXTnJZV0pzClpTQlRhV2R1YVc1bklFRjFkR2h2Y21sMGVTQkpibU14RlRBVEJnTlZCQU1NREhOMFlXTnJZV0pzWlM1a1pUQ0MKQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFOblYvdmJ5M1JvNTdhMnF2UVJubjBqZQplS01VMitGMCtsWk5DQXZpR1VENWJtOGprOTFvUFpuazBiaFFxZXlFcm1EUzRXVDB6ZXZFUklCSkpEamZMMEQ4CjQ2QmU3UGlNS2UwZEdqb3FJM3o1Y09JZWpjOGFMUEhTSWxnTjZsVDNmSXJ1UzE2Y29RZ0c0dWFLaUhGNStlV0YKRFJVTGR1NmRzWXV6NmRLanFSaVVPaEh3RHd0VUprRHdQditFSXRxbzBIK01MRkxMWU0wK2xFSWFlN2RONUNRNQpTbzVXaEwyY3l2NVZKN2xqL0VBS0NWaUlFZ0NtekRSRGNSZ1NTald5SDRibjZ5WDIwMjZmUEl5V0pGeUVkTC82CmpBT0pBRERSMEd5aE5PWHJFZXFob2NTTW5JYlFWcXdBVDBrTWh1WFN2d3Zscm5MeVRwRzVqWm00bFVNMzRrTUMKQXdFQUFhTlRNRkV3SFFZRFZSME9CQllFRkVJM1JNTWl5aUJqeVExUlM4bmxPUkpWZDFwQk1COEdBMVVkSXdRWQpNQmFBRkVJM1JNTWl5aUJqeVExUlM4bmxPUkpWZDFwQk1BOEdBMVVkRXdFQi93UUZNQU1CQWY4d0RRWUpLb1pJCmh2Y05BUUVMQlFBRGdnRUJBSHRLUlhkRmR0VWh0VWpvZG1ZUWNlZEFEaEhaT2hCcEtpbnpvdTRicmRrNEhmaEYKTHIvV0ZsY1JlbWxWNm1Cc0xweU11SytUZDhaVUVRNkpFUkx5NmxTL2M2cE9HeG5CNGFDbEU4YXQrQytUakpBTwpWbTNXU0k2VlIxY0ZYR2VaamxkVlE2eGtRc2tNSnpPN2RmNmlNVFB0VjVSa01lSlh0TDZYYW1FaTU0ckJvZ05ICk5yYStFSkJRQmwvWmU5ME5qZVlidjIwdVFwWmFhWkZhYVNtVm9OSERwQndsYTBvdXkrTWpPYkMzU3BnT3ExSUMKUGwzTnV3TkxWOFZiT3I1SHJoUUFvS21nU05iM1A4dmFUVnV4L1gwWWZqeS9TN045a1BCYUs5bUZqNzR6d1Y5dwpxU1ExNEtsNWpPM1YzaHJHV1laRWpET2diWnJyRVgxS1hFdXN0K1E9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR5RENDQXJDZ0F3SUJBZ0lVQ0kyUE5OcnR6cDZRbDdHa3VhRnhtRGE2VUJvd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2V6RUxNQWtHQTFVRUJoTUNSRVV4R3pBWkJnTlZCQWdNRWxOamFHeGxjM2RwWnkxSWIyeHpkR1ZwYmpFTwpNQXdHQTFVRUJ3d0ZWMlZrWld3eEtEQW1CZ05WQkFvTUgxTjBZV05yWVdKc1pTQlRhV2R1YVc1bklFRjFkR2h2CmNtbDBlU0JKYm1NeEZUQVRCZ05WQkFNTURITjBZV05yWVdKc1pTNWtaVEFnRncweU16QTJNVFl4TWpVeE1ESmEKR0E4eU1USXpNRFV5TXpFeU5URXdNbG93WGpFTE1Ba0dBMVVFQmhNQ1JFVXhHekFaQmdOVkJBZ01FbE5qYUd4bApjM2RwWnkxSWIyeHpkR1ZwYmpFT01Bd0dBMVVFQnd3RlYyVmtaV3d4RWpBUUJnTlZCQW9NQ1ZOMFlXTnJZV0pzClpURU9NQXdHQTFVRUF3d0ZiV2x1YVc4d2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUIKQVFDanluVnorWEhCOE9DWTRwc0VFWW1qb2JwZHpUbG93d2NTUU4rWURQQ2tCZW9yMFRiODdFZ0x6SksrSllidQpwb1hCbE5JSlBRYW93SkVvL1N6U2s4ZnUyWFNNeXZBWlk0RldHeEp5Mnl4SXh2UC9pYk9HT1l1aVBHWEsyNHQ2ClpjR1RVVmhhdWlaR1Nna1dyZWpXV2g3TWpGUytjMXZhWVpxQitRMXpQczVQRk1sYzhsNVYvK2I4WjdqTUppODQKbU9mSVB4amt2SXlKcjVVa2VGM1VmTHFKUzV5NExGNHR5NEZ0MmlBZDdiYmZIYW5mdlltdjZVb0RWdE1YdFdvMQpvUVBmdjNzaFdybVJMenc2ZXVJQXRiWGM1Q2pCeUlha0NiaURuQVU4cktnK0IxSjRtdlFnckx3bzNxUHJ5Smd4ClNkaWRtWjJtRVI3RXorYzVCMG0vTGlJaEFnTUJBQUdqWHpCZE1Cc0dBMVVkRVFRVU1CS0NCVzFwYm1sdmdnbHMKYjJOaGJHaHZjM1F3SFFZRFZSME9CQllFRkpRMGdENWtFdFFyK3REcERTWjdrd1o4SDVoR01COEdBMVVkSXdRWQpNQmFBRkVJM1JNTWl5aUJqeVExUlM4bmxPUkpWZDFwQk1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQmNkaGQrClI0Sm9HdnFMQms1OWRxSVVlY2N0dUZzcmRQeHNCaU9GaFlOZ1pxZWRMTTBVTDVEenlmQUhmVk8wTGZTRURkZFgKUkpMOXlMNytrTVUwVDc2Y3ZkQzlYVkFJRTZIVXdUbzlHWXNQcXN1eVpvVmpOcEVESkN3WTNDdm9ubEpWZTRkcQovZ0FiSk1ZQitUU21ZNXlEUHovSkZZL1haellhUGI3T2RlR3VqYlZUNUl4cDk3QXBTOFlJaXY3M0Mwd1ViYzZSCmgwcmNmUmJ5a1NRVWg5dmdWZFhSU1I4RFQzV0NmZHFOek5CWVh2OW1xZlc1ejRzYkdqK2wzd1VsL0kzRi9tSXcKZnlPNEN0aTRha2lHVkhsZmZFeTB3a3pWYUJ4aGNYajJJM0JVVGhCNFpxamxzc2llVmFGa3d2WG1teVJUMG9FVwo1SCtOUEhjcXVTMXpQc2NsCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2QUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktZd2dnU2lBZ0VBQW9JQkFRQ2p5blZ6K1hIQjhPQ1kKNHBzRUVZbWpvYnBkelRsb3d3Y1NRTitZRFBDa0Jlb3IwVGI4N0VnTHpKSytKWWJ1cG9YQmxOSUpQUWFvd0pFbwovU3pTazhmdTJYU015dkFaWTRGV0d4SnkyeXhJeHZQL2liT0dPWXVpUEdYSzI0dDZaY0dUVVZoYXVpWkdTZ2tXCnJlaldXaDdNakZTK2MxdmFZWnFCK1ExelBzNVBGTWxjOGw1Vi8rYjhaN2pNSmk4NG1PZklQeGprdkl5SnI1VWsKZUYzVWZMcUpTNXk0TEY0dHk0RnQyaUFkN2JiZkhhbmZ2WW12NlVvRFZ0TVh0V28xb1FQZnYzc2hXcm1STHp3NgpldUlBdGJYYzVDakJ5SWFrQ2JpRG5BVThyS2crQjFKNG12UWdyTHdvM3FQcnlKZ3hTZGlkbVoybUVSN0V6K2M1CkIwbS9MaUloQWdNQkFBRUNnZ0VBQWQzdDVzdUNFMjdXY0llc3NxZ3NoSFAwZHRzKyswVzF6K3h6WC8xTnhPRFkKWVhWNkJmbi9mRHJ4dFQ4aVFaZ2VVQzJORTFQaHZveXJXdWMvMm9xYXJjdEd1OUFZV29HNjJLdG9VMnpTSFdZLwpJN3VERTFXV2xOdlJZVFdOYW5DOGV4eGpRRzE4d0RKWjFpdFhTeEl0NWJEM3lrL3dUUlh0dCt1SnpyVjVqb2N1CmNoeERMd293aXUxQWo2ZFJDWk5CejlUSnh5TnI1ME5ZVzJVWEJhVC84N1hyRkZkSndNVFZUMEI3SE9uRzdSQlYKUWxLdzhtcVZiYU5lbmhjdk1qUjI5c3hUekhSK2p4SU8zQndPNk9Hai9PRmhGQllVN1RMWGVsZDFxb2UwdmIyRwpiOGhQcEd1cHRyNUF0OWx3MXc1d1EzSWdpdXRQTkg1cXlEeUNwRWw2RVFLQmdRRGNkYnNsT2ZLSmo3TzJMQXlZCkZ0a1RwaWxFMFYzajBxbVE5M0lqclY0K0RSbUxNRUIyOTk0MDdCVVlRUWoxL0RJYlFjb1oyRUVjVUI1cGRlSHMKN0RNRUQ2WExIYjJKVTEyK2E3c1d5Q05kS2VjZStUNy9JYmxJOFR0MzQwVWxIUTZ6U01TRGNqdmZjRkhWZ3YwcwpDYWpoRng3TmtMRVhUWnI4ZlQzWUloajR2UUtCZ1FDK01nWjFVbW9KdzlJQVFqMnVJVTVDeTl4aldlWURUQU8vCllhWEl6d2xnZTQzOE1jYmI0Y04yU2FOU0dEZ1Y3bnU1a3FpaWhwalBZV0lpaU9CcDlrVFJIWE9kUFc0N3N5ZUkKdDNrd3JwMnpWbFVnbGNNWlo2bW1WM1FWYUFOWmdqVTRSU3Y0ZS9WeFVMamJaYWZqUHRaUnNqWkdwSzBZVTFvdApWajhJZVE3Zk5RS0JnQ1ArWk11ekpsSW5VQ1FTRlF4UHpxbFNtN0pNckpPaHRXV2h3TlRxWFZTc050dHV5VmVqCktIaGpneDR1b0JQcFZSVDJMTlVEWmI0RnByRjVPYVhBK3FOVEdyS0s3SU1iUlZidHArSVVVeEhHNGFGQStIUVgKUVhVVFRhNUpRT1RLVmJnWHpWM1lyTVhTUk1valZNcDMyVWJHeTVTc1p2MXpBamJ2QzhYWjYxSFJBb0dBZEJjUQp2aGU1eFpBUzVEbUtjSGkvemlHa3ViZXJuNk9NUGdxYUtJSEdsVytVOExScFR0ajBkNFRtL1Rydk1PUEovVEU1CllVcUtoenBIcmhDaCtjdHBvY0k2U1dXdm5SenpLbzNpbVFaY0Y1VEFqUTBjY3F0RmI5UzlkRHR5bi9YTUNqYWUKYWlNdll5VUVVRll5TFpDelBGWnNycDNoVVpHKzN5RmZoQXB3TzJrQ2dZQkh3WWFQSWRXNld3NytCMmhpbjBvdwpqYTNjZXN2QTRqYU1Qd1NMVDhPTnRVMUdCU01md2N6TWJuUEhMclJ2Qjg3bjlnUGFSMndRR1VtckZFTzNMUFgvCmtSY09HcFlCSHBEWEVqRGhLa1dkUnVMT0ZnNEhMWmRWOEFOWmxRMFZTY0U4dTNkRERVTzg5cEdEbjA4cVRBcmwKeDlreHN1ZEVWcmtlclpiNVV4RlZxUT09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
name: minio-credentials
spec:
backend:
k8sSearch:
searchNamespace:
pod: {}
---
apiVersion: v1
kind: Secret
metadata:
name: minio-credentials-secret
labels:
secrets.stackable.tech/class: minio-credentials
stringData:
accessKey: minio-access-key
secretKey: minio-secret-key
---
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCatalog
metadata:
name: tpch
labels:
trino: trino-fault-tolerant
spec:
connector:
tpch: {}

3 changes: 3 additions & 0 deletions docs/modules/trino/pages/usage-guide/configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ For a role or role group, at the same level of `config`, you can specify `config

For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].

TIP: For fault-tolerant execution configuration, use the dedicated `faultTolerantExecution` section in the cluster configuration instead of `configOverrides`.
See xref:usage-guide/fault-tolerant-execution.adoc[] for detailed instructions.

[source,yaml]
----
workers:
Expand Down
213 changes: 213 additions & 0 deletions docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
= Fault-tolerant execution
:description: Configure fault-tolerant execution in Trino clusters for improved query resilience and automatic retry capabilities.
:keywords: fault-tolerant execution, retry policy, exchange manager, spooling, query resilience

Fault-tolerant execution is a mechanism in Trino that enables a cluster to mitigate query failures by retrying queries or their component tasks in the event of failure.
With fault-tolerant execution enabled, intermediate exchange data is spooled and can be re-used by another worker in the event of a worker outage or other fault during query execution.

By default, if a Trino node lacks the resources to execute a task or otherwise fails during query execution, the query fails and must be run again manually.
The longer the runtime of a query, the more likely it is to be susceptible to such failures.

NOTE: Fault tolerance does not apply to broken queries or other user error.
For example, Trino does not spend resources retrying a query that fails because its SQL cannot be parsed.

Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html[Trino documentation for fault-tolerant execution {external-link-icon}^] to learn more.

== Configuration

Fault-tolerant execution is not enabled by default.
To enable the feature, you need to configure it in your `TrinoCluster` resource by adding a `faultTolerantExecution` section to the cluster configuration.
The configuration uses a structured approach where you choose either `query` or `task` retry policy, each with their specific configuration options.

=== Query retry policy

A `query` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
This policy is recommended when the majority of the Trino cluster's workload consists of many small queries.

By default, Trino does not implement fault tolerance for queries whose result set exceeds 32MB in size.
This limit can be increased by modifying the `exchangeDeduplicationBufferSize` configuration property to be greater than the default value of `32MB`, but this results in higher memory usage on the coordinator.

[source,yaml]
----
spec:
clusterConfig:
faultTolerantExecution:
query:
retryAttempts: 3
exchangeDeduplicationBufferSize: 64MB # Increased from default 32MB
----

=== Task retry policy

A `task` retry policy instructs Trino to retry individual query tasks in the event of failure.
You **must** configure an exchange manager to use the task retry policy.
This policy is recommended when executing large batch queries, as the cluster can more efficiently retry smaller tasks within the query rather than retry the whole query.

IMPORTANT: A `task` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
As a best practice, it is recommended to run a dedicated cluster with a `task` retry policy for large batch queries, separate from another cluster that handles short queries.
There are tools that can help you achieve this by automatically routing queries based on certain criteria (such as query estimates or user) to different Trino clusters. Notable mentions are link:https://github.com/stackabletech/trino-lb[trino-lb {external-link-icon}^] and link:https://github.com/trinodb/trino-gateway[trino-gateway {external-link-icon}^].

[source,yaml]
----
spec:
clusterConfig:
faultTolerantExecution:
task:
retryAttemptsPerTask: 4
exchangeManager: # Mandatory for Task retry policy
encryptionEnabled: true
s3:
baseDirectories:
- "s3://trino-exchange-bucket/spooling"
connection:
reference: my-s3-connection # <1>
----
<1> Reference to an xref:concepts:s3.adoc[S3Connection] resource

== Exchange manager

Exchange spooling is responsible for storing and managing spooled data for fault-tolerant execution.
You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, HDFS, or local filesystem.

NOTE: An exchange manager is required when using the `task` retry policy and optional for the `query` retry policy.

=== S3-compatible storage

You can use S3-compatible storage systems for exchange spooling, including AWS S3 and MinIO.

[source,yaml]
----
spec:
clusterConfig:
faultTolerantExecution:
task:
retryAttemptsPerTask: 4
exchangeManager:
s3:
baseDirectories: # <1>
- "s3://exchange-bucket-1/trino-spooling"
connection:
reference: minio-s3-connection # <2>
---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Connection
metadata:
name: minio-s3-connection
spec:
host: minio.default.svc.cluster.local
port: 9000
accessStyle: Path
credentials:
secretClass: minio-secret-class
tls:
verification:
server:
caCert:
secretClass: tls
----
<1> Multiple S3 buckets can be specified to distribute I/O load
<2> S3 connection defined as a reference to an xref:concepts:s3.adoc[S3Connection] resource

For storage systems like Google Cloud Storage or Azure Blob Storage, you can use the S3-compatible configuration with `configOverrides` to provide the necessary exchange manager properties.

=== HDFS storage

You can configure HDFS as the exchange spooling destination:

[source,yaml]
----
spec:
clusterConfig:
faultTolerantExecution:
task:
retryAttemptsPerTask: 4
exchangeManager:
hdfs:
baseDirectories:
- "hdfs://simple-hdfs/exchange-spooling"
hdfs:
configMap: simple-hdfs # <1>
----
<1> ConfigMap containing HDFS configuration files (created by the HDFS operator)

=== Local filesystem storage

Local filesystem storage is supported but only recommended for development or single-node deployments:

WARNING: It is only recommended to use a local filesystem for exchange in standalone, non-production clusters.
A local directory can only be used for exchange in a distributed cluster if the exchange directory is shared and accessible from all nodes.

[source,yaml]
----
spec:
clusterConfig:
faultTolerantExecution:
task:
exchangeManager:
local:
baseDirectories:
- "/trino-exchange"
coordinators:
roleGroups:
default:
replicas: 1
podOverrides:
spec:
volumes:
- name: trino-exchange
persistentVolumeClaim:
claimName: trino-exchange-pvc
containers:
- name: trino
volumeMounts:
- name: trino-exchange
mountPath: /trino-exchange
workers:
roleGroups:
default:
replicas: 1
podOverrides:
spec:
volumes:
- name: trino-exchange
persistentVolumeClaim:
claimName: trino-exchange-pvc
containers:
- name: trino
volumeMounts:
- name: trino-exchange
mountPath: /trino-exchange
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: trino-exchange-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
----

== Connector support

Support for fault-tolerant execution of SQL statements varies on a per-connector basis.
Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html#configuration[Trino documentation {external-link-icon}^] to see which connectors support fault-tolerant execution.

When using connectors that do not explicitly support fault-tolerant execution, you may encounter a "This connector does not support query retries" error message.

== Example

Here's an example of a Trino cluster with fault-tolerant execution enabled using the `task` retry policy and MinIO backed S3 as the exchange manager:

[source,bash]
----
stackablectl operator install commons secret listener trino
helm install minio minio --repo https://charts.bitnami.com/bitnami --version 15.0.7 --set auth.rootUser=minio-access-key --set auth.rootPassword=minio-secret-key --set tls.enabled=true --set tls.existingSecret=minio-tls-certificates --set provisioning.enabled=true --set provisioning.buckets[0].name=trino-exchange-bucket
----

[source,yaml]
----
include::example$usage-guide/fault-tolerant-execution.yaml[]
----
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,12 @@ spec:
All queries that take less than the minimal graceful shutdown period of all roleGroups (`1` hour as a default) are guaranteed to not be disturbed by regular termination of Pods.
They can obviously still fail when, for example, a Kubernetes node dies or gets rebooted before it is fully drained.

Because of this, the operator automatically restricts the execution time of queries to the minimal graceful shutdown period of all roleGroups using the Trino configuration `query.max-execution-time=3600s`.
Because of this, the operator automatically restricts the execution time of queries to the minimal graceful shutdown period of all roleGroups using the Trino configuration `query.max-execution-time=3600s` when xref:usage-guide/fault-tolerant-execution.adoc[fault tolerant execution] is not configured.
This causes all queries that take longer than 1 hour to fail with the error message `Query failed: Query exceeded the maximum execution time limit of 3600s.00s`.

In case you need to execute queries that take longer than the configured graceful shutdown period, you need to increase the `query.max-execution-time` property as follows:
However, when xref:usage-guide/fault-tolerant-execution.adoc[fault tolerant execution] is enabled, the `query.max-execution-time` restriction is not applied since queries can be automatically retried in case of failures, allowing them to run indefinitely without being cancelled by worker restarts.

In case you need to execute queries that take longer than the configured graceful shutdown period and do not want to configure fault tolerant execution, you can increase the `query.max-execution-time` property as follows:

[source,yaml]
----
Expand All @@ -95,8 +97,6 @@ spec:
----

Keep in mind, that queries taking longer than the graceful shutdown period are now subject to failure when a Trino worker gets shut down.
Running into this issue can be circumvented by using https://trino.io/docs/current/admin/fault-tolerant-execution.html[Fault-tolerant execution], which is not supported natively yet.
Until native support is added, you will have to use `configOverrides` to enable it.

== Authorization requirements

Expand Down
Loading
Loading