Let bootstrap work on tables that aren't partitioned yet. #10

jcjones · 2021-04-28T06:09:22Z

Add a --assume-partitioned-on flag to bootstrap, to facilitate operation on
tables that do not yet have a partition map. One needs to supply
--assume-partitioned-on COLUMN_NAME as many times as needed to identify all
the columns which will be part of the partition expression.

- This moves from the -E to -X XML-based CLI and parses it

…ion-manager

This removes any dependnecy on the auto-increment value, as AI can only tell us a single column's position, and for multi-column partitions we'll need more than that. This change does not totally remove the get_autoincrement method, as we'll still want to confirm the table has the partitioned feature, which we'll do in a next commit.

Removes the auto_incremennt mechanisms.

This adds the necessary retention options for later command processing, which is not included in this commit.

…vements too.

There are other BY RANGE options too, but let's just start with COLUMNS

* First pass algorithm * Add ability to compare partition positions * Add a split method for dividing partition lists * More tests * Add a position rate function * Add methods to determine a weighted rate of increase * Add docs to the new table_append_partition methods * Use the Partition timestamp() method * plan_partition_changes algorithm * More partition planning tests * Predictive partitiong algorithm functioning in tests * Rework the CLI to use the new partition planning algorithm * Passing integration tests * Handle short and bespoke partition names. * Improve logging * Remove spurious strip * Moving to 0.2.0 * Logging cleanups * Fix a host of pylint issues $ pylint --ignore-patterns=.*_test.py partitionmanager/ --disable W1203 --disable invalid-name --disable bad-continuation ************* Module partitionmanager.tools partitionmanager/tools.py:22:11: R1708: Do not raise StopIteration in generator, use return statement instead (stop-iteration-return) ************* Module partitionmanager.sql partitionmanager/sql.py:36:0: R0903: Too few public methods (1/2) (too-few-public-methods) ************* Module partitionmanager.stats partitionmanager/stats.py:12:0: R0903: Too few public methods (0/2) (too-few-public-methods) partitionmanager/stats.py:65:0: R0912: Too many branches (14/12) (too-many-branches) ************* Module partitionmanager.table_append_partition partitionmanager/table_append_partition.py:98:0: R0914: Too many local variables (16/15) (too-many-locals) partitionmanager/table_append_partition.py:306:0: R0914: Too many local variables (23/15) (too-many-locals) ------------------------------------------------------------------ Your code has been rated at 9.92/10 (previous run: 9.91/10, +0.01) * Better logging on partition * Never adjust the active_partition MariaDB has a limitation on editing the active partition, particularly: `ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges except for last partition where it can extend the range` so we can't edit the active partition, either. * Never edit positions on empty partitions Like the previous commit, MariaDB has a limitation on editing any partition's offset: `ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges except for last partition where it can extend the range` So the positions field should never be edited for existing partitions, only their names. * Consolidate logic to use partition names as start-of-fill dates * stderr is not so useful from the Subprocess Database Command, let's dump it * Bugfix: get_current_positions needs to query the latest of each column Before, get_current_positions returned each column for the entry with the largest ID from the first column, while for partitioning purposes we actually want to always be strictly increasing. This does make such tables less space-efficient, but that's a matter for partition design. * Add "bootstrap" methods to prepare partitioned tables Tables whose partitions don't contain datestamps of the p_YYYYMMDD form don't provide partman enough info to derive rates of change, so these bootstrap routines will save a YAML file somewhere with point-in-time data that can be reloaded to derive a rate-of-change. This is only intended to be used for the initial partitioning of a table, or when a table has no empty partitions. In a subsequent commit I'll tie this into cli.py, ensuring to add alerts that these ALTERs cannot be expected to complete quickly, that likely the database will hold locks for substantial amounts of time for each of the ALTER commands, and the tool will simply be printing potential ALTER commands to console for an operator to analyze and run in the manner they find best. * Wire up Bootstrap to the CLI * Rework CLI to print yaml-like but stringified output

…xclusive

Aaron's ordering makes this a lot clearer [1]. Thank you, Sir Aaron! [1] #4 (comment)

The rewrite of much of the Docstrings brought up the high confusion around a couple of methods. The partition types' has_time and timestamp methods, while they worked, spiked my WTFs-per-hour metric, so I renamed and reworked them a little to smooth out the furrowed eyebrows. Now it's has_real_time, and it is exact rather than a half-hearted effort. The table information schema methods were from a bygone era, with inaccurate docstrings, and formerly raised assertions and such, whereas now they return a problem-string. I renamed them to get_table_information_schema_problems and get_table_compatibility_problems. Still probably imperfect since they return a single problem at a time, but at least they explain what they do. No true functionality changes are in this commit.

…e Maintain algo The README.md expresses the "Maintain" algorithm in its ideal form, which got messy in implementation. The get_pending_sql_reorganize_partition_commands makes that clean again, so that there's a clear map to the algorithm description.

jcjones · 2021-05-17T21:48:15Z

Yeah so this is cute and all, but it needs to produce data copy tooling before we can present this to the ball.

Add a --assume-partitioned-on flag to bootstrap, to facilitate operation on tables that do not yet have a partition map. One needs to supply `--assume-partitioned-on COLUMN_NAME` as many times as needed to identify all the columns which will be part of the partition expression. The 'bootstrap' command now emits table-copy instructions The original plan for 'bootstrap' was to do live alterations, that they should only lock what they needed, however InnoDB likes to lock everything. So instead we need to always assume that "bootstrapping" will be a live table clone, and at their conclusion the team should perform an atomic rename.

jcjones · 2021-07-27T21:06:02Z

This got butchered in the process of rebasing it off PR #4, so re-opened as #13

aarongable and others added 30 commits April 20, 2021 17:15

Empty commit to diff against

ebe046d

Initial functionality

1b96dc9

Fix cli test to expect the current date

5d06188

Add more CircleCI tests - flake8, pylint

58375fe

Initial PEP249 MySQL connector support

8ffc807

Remove db option

3bf8eb1

Use structures instead of line parsing for all DB queries

1ef13f6

- This moves from the -E to -X XML-based CLI and parses it

Functioning via DB connect for single-value partitioned tables

4504c0d

Test for duplicates, add a few tests of reorganize_partition

93e424b

Catch truncated xml results from a subprocess

6897140

Dwarf tables

883383e

Rename to sequential partition manager, and change CLI tool to partit…

eac56fa

…ion-manager

Typo fix in README

72ad177

Confirm partitioned status before operating on any supplied table

7b74012

Removes the auto_incremennt mechanisms.

XML parser improved tests and assertions

8abb8f2

Move partition definitions from tuples to explicit classes

b400ce6

Show query debugging

69ee351

Add basic YAML configuration processing.

78d188f

This adds the necessary retention options for later command processing, which is not included in this commit.

Rename add_partition to just add

8568cf7

Add a lifespan configuration value and only partition when needed

2f489a1

Have full time resolution for partition decisions. Some logging impro…

bce1527

…vements too.

Add basic statistcs command.

6dd5c2e

Print table compatibility issues for all tables before exiting.

c681cb7

Bugfix: Tables with multiple columns use BY RANGE COLUMNS

5b64d19

There are other BY RANGE options too, but let's just start with COLUMNS

Export Prometheus-style statistics for stats command, if configured

09f1a1d

Improve tests for the prometheus stats

3f8240b

Add a time_since_oldest gauge, and rename the gauges a bit

bce1224

Emit stats on an 'add' command too

b46b703

Fix prometheus quoting of the labels

ae7e133

jcjones and others added 22 commits April 20, 2021 17:17

Per-table partition durations

d9df699

v0.1.0: Minimal features complete

4371ecd

v0.1.1, Bugfix: yaml dburls weren't preprocessed

a830c33

Emit a statistic for table alteration time

dc1289d

Pre-commit: Make Black run before other tools

5a8b3a3

Spelling fix: partition_name_now

e2c263e

Rename partition_duration to partition_period

b014b7a

Update README, bugfix that --table and --in/--out were all mutually e…

476aedf

…xclusive

More documentation for the Types

c7c1664

Update per reviews as of 10am PT 21 April, except DocString revisions.

5e09ae2

Reorder conditionals in generate_sql_reorganize_partition_commands

6ee10ac

Aaron's ordering makes this a lot clearer [1]. Thank you, Sir Aaron! [1] #4 (comment)

Make Partition and PlannedPartition be internal types

a9b0f93

Make everything in the Partition classes internal

9313228

ChangePlannedPartition's old should be internal

d7adb39

Make all attributes of _PlannedPartition be internal

d1e2d49

Doc updates

bd983ec

Make all _internal where possible.

1a1155e

Move away from "from ... import" syntax

891d7d0

Rename "add" algorithm and mode to "maintain"

42dcdfa

jcjones marked this pull request as draft May 17, 2021 21:48

jcjones marked this pull request as ready for review July 27, 2021 17:26

jcjones closed this Jul 27, 2021

jcjones force-pushed the bootstrap_assume_partitioned branch from 1dee3a1 to 2448fce Compare July 27, 2021 21:00

jcjones deleted the bootstrap_assume_partitioned branch July 27, 2021 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Let bootstrap work on tables that aren't partitioned yet. #10

Let bootstrap work on tables that aren't partitioned yet. #10

Uh oh!

jcjones commented Apr 28, 2021

Uh oh!

jcjones commented May 17, 2021

Uh oh!

jcjones commented Jul 27, 2021

Uh oh!

Uh oh!

Let bootstrap work on tables that aren't partitioned yet. #10

Let bootstrap work on tables that aren't partitioned yet. #10

Uh oh!

Conversation

jcjones commented Apr 28, 2021

Uh oh!

jcjones commented May 17, 2021

Uh oh!

jcjones commented Jul 27, 2021

Uh oh!

Uh oh!