Skip to content

Retention-based Partition Dropping #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 28, 2024
Merged

Retention-based Partition Dropping #44

merged 10 commits into from
Feb 28, 2024

Conversation

jcjones
Copy link
Collaborator

@jcjones jcjones commented Apr 6, 2022

Determine which partitions are wholly outside of the configured retention_period for a given table and emit commands to drop them.

I'd just like to write for posterity that this is getting real hacky, even more so now with this branch rebased on top of the last year's worth of fixes.

To give just a hint of the deltas of the current version vs. what it was designed to do:

Partman gathers the Partition Table of each table, and attaches to each Partition the beginning and end positions, and then knowledge of how to derive the beginning and end timestamps. However, we know now that because MariaDB selects Partitions based on an incoming row matching any part of a partition, the partition offsets in the partition table are just guideposts. In reality, one must SELECT ... FROM table PARTITION(partition_name) ... to have actual truth of what the positions of the beginning and end of the partition. Deriving it from the table is dangerous, because it works until it doesn't, and that's where #64 comes from.

Partman also then assumed in the beginning that getting the start and end timestamps for a partition could come from the partition names, which leads to all these ALTER statements to keep the partition names roughly true. But that's hard to keep up with, and if we have a narrow set of partitions it ends up with giving very few data points from which to calculate a rate of change. But all the methods in Partman to calculate this stuff does so from the perspective of "calculate the rate of change of these partitions", but in reality we shouldn't care about the partitions at all. We should instead use the primary key, select the min, max, and then sample points between the two to get a better rate of change. Basically, we assumed in the beginning we could do this without earliest_utc_timestamp_query, but now that's basically mandatory. And we should use that.

And these decisions are baked into the myriad Partition types in types.py, which are all far too complicated.

Anyway, this thing needs a rewrite to become much simpler.

@jcjones jcjones force-pushed the age_analysis branch 3 times, most recently from 1ec5067 to 34755cb Compare April 7, 2022 17:26
@jcjones jcjones marked this pull request as ready for review April 28, 2023 22:18
Copy link

@kruti-s kruti-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some suggestions about potentially overlooked errors and test cases

@jcjones jcjones merged commit b271785 into main Feb 28, 2024
@jcjones jcjones deleted the age_analysis branch February 28, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants