Skip to content

Added quartz scheduler to schedule maintenance job. #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 29, 2025

Conversation

subkanthi
Copy link
Collaborator

closes: #1

@subkanthi subkanthi linked an issue Apr 21, 2025 that may be closed by this pull request
setMaintenanceSchedule(cronExpression);
}

private String convertToCronExpression(long interval, TimeUnit timeUnit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not asking to change anything but https://github.com/shyiko/skedule?tab=readme-ov-file#format would give us support for schedules like "every 12 hours", all with a lot less code & complexity (literally ~10 lines https://github.com/shyiko/skedule?tab=readme-ov-file#example-scheduling-using-scheduledthreadpoolexecutor).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed quartz and switched to skedule library,

@subkanthi
Copy link
Collaborator Author

Testing

2025-04-23 11:30:07 [er_Worker-1] INFO o.a.iceberg.RemoveSnapshots > Expiring snapshots older than: 2025-03-24T15:30:06.954+00:00 (1742830206954)
2025-04-23 11:30:07 [er_Worker-1] INFO o.a.iceberg.RemoveSnapshots > Committed snapshot changes
2025-04-23 11:30:07 [er_Worker-1] INFO o.a.iceberg.RemoveSnapshots > Cleaning up expired files (local, incremental)
2025-04-23 11:30:08 [er_Worker-1] INFO o.a.i.SnapshotProducer > Committed snapshot 7409259478793169588 (BaseRewriteManifests)
2025-04-23 11:30:08 [er_Worker-1] INFO o.a.iceberg.RemoveSnapshots > Expiring snapshots older than: 2025-03-24T15:30:08.175+00:00 (1742830208175)
2025-04-23 11:30:08 [er_Worker-1] INFO o.a.iceberg.RemoveSnapshots > Committed snapshot changes
2025-04-23 11:30:08 [er_Worker-1] INFO o.a.iceberg.RemoveSnapshots > Cleaning up expired files (local, incremental)
2025-04-23 11:30:08 [er_Worker-1] INFO o.a.i.SnapshotProducer > Committed snapshot 2527995599504485083 (BaseRewriteManifests)

@subkanthi subkanthi marked this pull request as ready for review April 23, 2025 16:48
scheduleNextMaintenance();
}

public void setMaintenanceMode(boolean enabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be private to avoid misuse

}

public void stopScheduledMaintenance() {
if (currentTask != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look thread-safe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added synchronization on object

List<TableIdentifier> tables = catalog.listTables(namespace);
for (TableIdentifier tableIdent : tables) {
int expirationDays = DEFAULT_EXPIRATION_DAYS;
String configuredDays = config.get(Config.OPTION_SNAPSHOT_EXPIRATION_DAYS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be better done in a constructor (with invalid value resulting in an exception)

logger.info("Next maintenance scheduled for: {}", next);
}

public void setMaintenanceSchedule(String scheduleExpression) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving this to constructor (and making schedule final). Otherwise we have an issue with thread-safety here.

@@ -276,6 +284,9 @@ public Integer call() throws Exception {

Catalog catalog = CatalogUtil.buildIcebergCatalog("rest_backend", config, null);

// Initialize and start the maintenance scheduler
initializeMaintenanceScheduler(catalog, config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have maintenance optional (e.g. disable when maintenanceInterval is empty)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic added inside the function

  private void initializeMaintenanceScheduler(Catalog catalog, Map<String, String> config) {
    if (maintenanceInterval == null || maintenanceInterval.trim().isEmpty()) {
      logger.info("Maintenance scheduler is disabled (no maintenance interval specified)");
      return;
    ```

@subkanthi subkanthi requested a review from shyiko April 24, 2025 20:58
@shyiko
Copy link
Collaborator

shyiko commented Apr 24, 2025

#17 (comment) may have been lost

@subkanthi
Copy link
Collaborator Author

#17 (comment) may have been lost

Sorry are u referring to the constructor comment, its done here


  public MaintenanceScheduler(
      Catalog catalog, Map<String, String> config, String maintenanceInterval) {
    this.catalog = catalog;
    this.executor = new ScheduledThreadPoolExecutor(1);
    ((ScheduledThreadPoolExecutor) executor).setRemoveOnCancelPolicy(true);
    this.schedule = Schedule.parse(maintenanceInterval);
    this.expirationDays = DEFAULT_EXPIRATION_DAYS;
    this.configuredDays = config.get(Config.OPTION_SNAPSHOT_EXPIRATION_DAYS);

Copy link
Collaborator

@shyiko shyiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to merge but it does not resolve #1: specifically, it does not appear to delete files that are not referenced by any of the snapshots (e.g. files created during failed inserts).

@shyiko shyiko merged commit 7414d64 into master Apr 29, 2025
1 check passed
@shyiko shyiko deleted the 1-ice-rest-catalog-automate-catalog-maintenance branch June 3, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ice-rest-catalog: Automate catalog maintenance
2 participants