Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Aug 19, 2025

About

@WalBeh contributed a powerful utility that uses CrateDB's system tables to find out about cluster imbalances related to shard number and shard size distribution across the whole cluster. Thanks!

After analysing the situation, the program presents solutions in form of SQL commands to bring the cluster into a more balanced state again.

Install

uv pip install --upgrade 'cratedb-toolkit @ git+https://github.com/crate/cratedb-toolkit.git@xmover'

Documentation

https://cratedb-toolkit--523.org.readthedocs.build/admin/xmover/

References

Copy link

coderabbitai bot commented Aug 19, 2025

Walkthrough

Adds XMover: a new administrative toolkit for CrateDB that analyzes shard distribution, recommends and optionally executes safe shard relocations, validates moves, monitors recoveries, exposes a Click CLI and script entrypoint, provides models/utilities/analysis/operational modules, extensive docs, and unit tests.

Changes

Cohort / File(s) Summary
Package & CLI Integration
CHANGES.md, pyproject.toml, cratedb_toolkit/cli.py, cratedb_toolkit/admin/xmover/__init__.py
Announce XMover in changelog; add rich dependency; add script entry point scripts.xmover = "cratedb_toolkit.admin.xmover.cli:main"; register top-level xmover CLI subcommand; add xmover package metadata.
Top-level CLI Implementation
cratedb_toolkit/admin/xmover/cli.py
New Click-based CLI wiring that initializes CrateDBClient, tests connectivity, and exposes subcommands (analyze, find-candidates, recommend, validate-move, test-connection, check-balance, shard-distribution, active-shards, zone-analysis, explain-error, monitor-recovery) with Rich output and interactive prompts.
Core Models
cratedb_toolkit/admin/xmover/model.py
New dataclasses and typed containers: NodeInfo, ShardInfo, RecoveryInfo, ActiveShardSnapshot, ActiveShardActivity, ShardRelocationRequest/Response (includes to_sql and safety_score), DistributionStats, SizeCriteria, ShardRelocationConstraints.
Database Client
cratedb_toolkit/admin/xmover/util/database.py
New CrateDBClient for HTTP /_sql access and helpers: execute_query, get_nodes_info, get_shards_info, get_shard_distribution_summary, test_connection, get_cluster_watermarks, get_active_recoveries, get_recovery_details, get_all_recovering_shards, get_active_shards_snapshot, and internal parsing helpers.
Formatting & Error Helpers
cratedb_toolkit/admin/xmover/util/format.py, cratedb_toolkit/admin/xmover/util/error.py
Add size/percentage/translog formatting helpers and explain_cratedb_error CLI helper (pattern-matching diagnostics) with a Rich Console instance.
Shard Analysis & Reporting
cratedb_toolkit/admin/xmover/analysis/shard.py, .../analysis/table.py, .../analysis/zone.py
Implement ShardAnalyzer, ShardReporter, DistributionAnalyzer, and ZoneReport: collection of cluster/table/zone distribution analysis, imbalance/anomaly detectors, move validation, decommission planning, and Rich rendering.
Operational Modules
cratedb_toolkit/admin/xmover/operational/candidates.py, .../operational/recommend.py, .../operational/monitor.py
CandidateFinder, ShardRelocationRecommender, and RecoveryMonitor: find candidate shards, generate/validate/optionally execute relocation plans with recovery-aware sequencing, and monitor ongoing recoveries (watch mode).
Attic / Skeleton
cratedb_toolkit/admin/xmover/attic.py
Commented skeleton for a decommission command (non-executable example/skeleton).
Documentation
doc/admin/index.md, doc/admin/xmover/handbook.md, doc/admin/xmover/index.md, doc/admin/xmover/queries.md, doc/admin/xmover/troubleshooting.md, doc/index.md
Add admin docs section and comprehensive XMover docs (handbook, index, query gallery, troubleshooting) and update docs toctree.
Tests
tests/admin/test_cli.py, tests/admin/test_active_shard_monitor.py, tests/admin/test_distribution_analyzer.py, tests/admin/test_recovery_monitor.py
Add unit/CLI tests for the xmover CLI and analysis/monitoring components, covering CLI command invocation, active-shard snapshots/activity, distribution analyzer behavior, and recovery monitoring parsing/formatting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as xmover CLI
  participant Client as CrateDBClient
  participant Analyzer as ShardAnalyzer
  participant Recommender as ShardRelocationRecommender
  participant DB as CrateDB

  User->>CLI: xmover recommend [options]
  CLI->>Client: init + test_connection()
  CLI->>Analyzer: init(client)
  CLI->>Recommender: execute(constraints, auto_execute, validate, dry_run)
  Recommender->>Analyzer: generate_rebalancing_recommendations(constraints)
  Analyzer->>Client: get_nodes_info()/get_shards_info()
  Client-->>Analyzer: nodes, shards
  Analyzer-->>Recommender: recommendations
  alt auto_execute and not dry_run
    Recommender->>Client: execute_query(ALTER TABLE ... REROUTE MOVE SHARD ...)
    Client->>DB: POST /_sql
    DB-->>Client: result
    Client-->>Recommender: success/failure
    Recommender->>Recommender: _wait_for_recovery_capacity()
  else dry_run
    Recommender-->>CLI: render recommendations (no execution)
  end
  CLI-->>User: rendered output / SQL / status
Loading
sequenceDiagram
  autonumber
  actor User
  participant CLI as xmover monitor-recovery
  participant Client as CrateDBClient
  participant Monitor as RecoveryMonitor
  participant DB as CrateDB

  User->>CLI: xmover monitor-recovery [--watch]
  CLI->>Monitor: start(watch)
  loop refresh_interval (watch mode)
    Monitor->>Client: get_all_recovering_shards(filters)
    Client->>DB: query sys.allocations/sys.shards
    DB-->>Client: rows
    Client-->>Monitor: RecoveryInfo[]
    Monitor-->>CLI: formatted table + deltas
  end
  CLI-->>User: summary
Loading
sequenceDiagram
  autonumber
  actor User
  participant CLI as xmover validate-move
  participant Analyzer as ShardAnalyzer
  participant Client as CrateDBClient

  User->>CLI: xmover validate-move <schema.table> <shard> <from> <to>
  CLI->>Analyzer: init(client)
  CLI->>Analyzer: validate_move_safety(recommendation, max_disk_usage)
  Analyzer->>Client: lookups (nodes/shards)
  Client-->>Analyzer: details
  Analyzer-->>CLI: (is_safe, reason)
  CLI-->>User: verdict + SQL command
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • WalBeh
  • surister
  • hammerhead

Poem

I nudge the shards with twitchy care,
I count their hops and map each lair.
I wait for recoveries, then gently shove,
No data lost — just balance and love.
Thump, thump, I nudge them home — then eat my carrot. 🥕

✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xmover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@amotl amotl force-pushed the xmover branch 2 times, most recently from 6a6b8f0 to 5068671 Compare August 21, 2025 12:15
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@amotl
Copy link
Member Author

amotl commented Sep 19, 2025

Thank you, @WalBeh!

This comment was marked as spam.

This comment was marked as off-topic.

@amotl amotl requested review from seut and WalBeh September 19, 2025 18:31
@amotl amotl marked this pull request as draft September 23, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants