Skip to content

Let bootstrap work on tables that aren't partitioned yet. #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 54 commits into from

Conversation

jcjones
Copy link
Collaborator

@jcjones jcjones commented Apr 28, 2021

Add a --assume-partitioned-on flag to bootstrap, to facilitate operation on
tables that do not yet have a partition map. One needs to supply
--assume-partitioned-on COLUMN_NAME as many times as needed to identify all
the columns which will be part of the partition expression.

aarongable and others added 30 commits April 20, 2021 17:15
- This moves from the -E to -X XML-based CLI and parses it
This removes any dependnecy on the auto-increment value, as AI can only tell
us a single column's position, and for multi-column partitions we'll need
more than that.

This change does not totally remove the get_autoincrement method, as we'll
still want to confirm the table has the partitioned feature, which we'll do
in a next commit.
This adds the necessary retention options for later command processing, which
is not included in this commit.
There are other BY RANGE options too, but let's just start with COLUMNS
jcjones and others added 22 commits April 20, 2021 17:17
* First pass algorithm

* Add ability to compare partition positions

* Add a split method for dividing partition lists

* More tests

* Add a position rate function

* Add methods to determine a weighted rate of increase

* Add docs to the new table_append_partition methods

* Use the Partition timestamp() method

* plan_partition_changes algorithm

* More partition planning tests

* Predictive partitiong algorithm functioning in tests

* Rework the CLI to use the new partition planning algorithm

* Passing integration tests

* Handle short and bespoke partition names.

* Improve logging

* Remove spurious strip

* Moving to 0.2.0

* Logging cleanups

* Fix a host of pylint issues

$ pylint --ignore-patterns=.*_test.py partitionmanager/ --disable W1203 --disable invalid-name --disable bad-continuation

************* Module partitionmanager.tools
partitionmanager/tools.py:22:11: R1708: Do not raise StopIteration in generator, use return statement instead (stop-iteration-return)
************* Module partitionmanager.sql
partitionmanager/sql.py:36:0: R0903: Too few public methods (1/2) (too-few-public-methods)
************* Module partitionmanager.stats
partitionmanager/stats.py:12:0: R0903: Too few public methods (0/2) (too-few-public-methods)
partitionmanager/stats.py:65:0: R0912: Too many branches (14/12) (too-many-branches)
************* Module partitionmanager.table_append_partition
partitionmanager/table_append_partition.py:98:0: R0914: Too many local variables (16/15) (too-many-locals)
partitionmanager/table_append_partition.py:306:0: R0914: Too many local variables (23/15) (too-many-locals)

------------------------------------------------------------------
Your code has been rated at 9.92/10 (previous run: 9.91/10, +0.01)

* Better logging on partition

* Never adjust the active_partition

MariaDB has a limitation on editing the active partition, particularly:

`ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges
except for last partition where it can extend the range`

so we can't edit the active partition, either.

* Never edit positions on empty partitions

Like the previous commit, MariaDB has a limitation on editing any partition's
offset:

    `ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges
    except for last partition where it can extend the range`

So the positions field should never be edited for existing partitions, only
their names.

* Consolidate logic to use partition names as start-of-fill dates

* stderr is not so useful from the Subprocess Database Command, let's dump it

* Bugfix: get_current_positions needs to query the latest of each column

Before, get_current_positions returned each column for the entry with the
largest ID from the first column, while for partitioning purposes we
actually want to always be strictly increasing.

This does make such tables less space-efficient, but that's a matter for
partition design.

* Add "bootstrap" methods to prepare partitioned tables

Tables whose partitions don't contain datestamps of the p_YYYYMMDD form don't
provide partman enough info to derive rates of change, so these bootstrap
routines will save a YAML file somewhere with point-in-time data that can be
reloaded to derive a rate-of-change. This is only intended to be used for the
initial partitioning of a table, or when a table has no empty partitions.

In a subsequent commit I'll tie this into cli.py, ensuring to add alerts that
these ALTERs cannot be expected to complete quickly, that likely the database
will hold locks for substantial amounts of time for each of the ALTER commands,
and the tool will simply be printing potential ALTER commands to console for
an operator to analyze and run in the manner they find best.

* Wire up Bootstrap to the CLI

* Rework CLI to print yaml-like but stringified output
Aaron's ordering makes this a lot clearer [1]. Thank you, Sir Aaron!

[1] #4 (comment)
The rewrite of much of the Docstrings brought up the high confusion around a
couple of methods.

The partition types' has_time and timestamp methods, while they worked, spiked
my WTFs-per-hour metric, so I renamed and reworked them a little to smooth out
the furrowed eyebrows. Now it's has_real_time, and it is exact rather than
a half-hearted effort.

The table information schema methods were from a bygone era, with inaccurate
docstrings, and formerly raised assertions and such, whereas now they return
a problem-string. I renamed them to get_table_information_schema_problems and
get_table_compatibility_problems. Still probably imperfect since they return
a single problem at a time, but at least they explain what they do.

No true functionality changes are in this commit.
…e Maintain algo

The README.md expresses the "Maintain" algorithm in its ideal form, which got
messy in implementation. The get_pending_sql_reorganize_partition_commands makes
that clean again, so that there's a clear map to the algorithm description.
@jcjones
Copy link
Collaborator Author

jcjones commented May 17, 2021

Yeah so this is cute and all, but it needs to produce data copy tooling before we can present this to the ball.

@jcjones jcjones marked this pull request as draft May 17, 2021 21:48
@jcjones jcjones marked this pull request as ready for review July 27, 2021 17:26
Add a --assume-partitioned-on flag to bootstrap, to facilitate operation on
tables that do not yet have a partition map. One needs to supply
`--assume-partitioned-on COLUMN_NAME` as many times as needed to identify all
the columns which will be part of the partition expression.

The 'bootstrap' command now emits table-copy instructions

The original plan for 'bootstrap' was to do live alterations, that they should
only lock what they needed, however InnoDB likes to lock everything. So instead
we need to always assume that "bootstrapping" will be a live table clone,
and at their conclusion the team should perform an atomic rename.
@jcjones jcjones closed this Jul 27, 2021
@jcjones jcjones force-pushed the bootstrap_assume_partitioned branch from 1dee3a1 to 2448fce Compare July 27, 2021 21:00
@jcjones jcjones deleted the bootstrap_assume_partitioned branch July 27, 2021 21:01
@jcjones
Copy link
Collaborator Author

jcjones commented Jul 27, 2021

This got butchered in the process of rebasing it off PR #4, so re-opened as #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants