Admin/XMover: Shard heat monitoring, optionally watching #527

amotl · 2025-08-21T19:48:23Z

About

At GH-526, we are thinking about next steps in user ergonomics when debugging a busy cluster. Based on the idea of computing deltas of certain metrics available through CrateDB's system tables about shards, @juanpardo starts exploring a few ideas here. Thanks!

Install

uv pip install --upgrade 'cratedb-toolkit @ git+https://github.com/crate/cratedb-toolkit.git@jp/xmover-shard-hot-monitor'

Details

Static reports are always topmost priority. In an optional variant, a --watch-like interval-updating view could also make sense. However, it is difficult to learn about historic progress/changes in this case, i.e. when leaving the screen for a while, so discrete output is always preferred as the primary operation mode.

- More or less just line-length fixes. - Only a single type adjustment was needed on the return value of the `analyze_distribution` method. - Ruff recommended to use set comprehensions, so here we go. - At a single spot where an exception has been `pass`ed, we added error output. Is it bad?

coderabbitai · 2025-08-21T19:53:20Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a ShardMonitor class and CLI subcommand to periodically monitor hot shards (computing seq/size deltas and rendering tables), extends ShardInfo with three seq_no_stats fields, updates DB queries to return those fields, and introduces duplicate ShardMonitor declarations.

Changes

Cohort / File(s)	Summary of changes
Monitoring feature (UI & CLI) `cratedb_toolkit/admin/xmover/analysis/shard.py`, `cratedb_toolkit/admin/xmover/cli.py`	Adds `ShardMonitor` (init, monitor_shards, refresh, delta calculations, UI table generation) and a CLI `monitor_shards` subcommand; monitoring loops with interval/repeat and live rendering. Note: `ShardMonitor` is declared twice in the same module.
Data model update `cratedb_toolkit/admin/xmover/model.py`	Adds three int fields to `ShardInfo`: `seq_stats_max_seq_no`, `seq_stats_global_checkpoint`, `seq_stats_local_checkpoint` (no defaults; changes dataclass signature).
Database query changes `cratedb_toolkit/admin/xmover/util/database.py`	`get_shards_info` selects three additional `s.seq_no_stats[...]` columns and maps them into the new `ShardInfo` fields; row shape extended.
Imports / utilities `cratedb_toolkit/admin/xmover/analysis/shard.py` (imports)	Adds `datetime`, `sleep` (from `time`), `Live` (from `rich.live`) and uses `format_size`/delta mappings for UI. Patch includes an extraneous `unittest.result` import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as xmover CLI
  participant Analyzer as ShardAnalyzer
  participant DB as CrateDB
  participant Monitor as ShardMonitor
  participant UI as Rich Console

  User->>CLI: run "xmover monitor_shards"
  CLI->>Analyzer: construct(client)
  CLI->>Monitor: construct(analyzer)
  loop polling
    Monitor->>Analyzer: refresh shards
    Analyzer->>DB: SELECT ... (+ seq_no_stats columns)
    DB-->>Analyzer: rows
    Analyzer-->>Monitor: list[ShardInfo]
    Monitor->>Monitor: compute seq/size deltas, update reference map
    Monitor->>UI: render shards table and nodes heat panel
    Monitor->>Monitor: sleep / repeat
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers at the stream,
Hot shards shimmer, numbers gleam,
Deltas dance across each node,
I hop to watch the monitoring road.
Tiny rabbit, metrics in steam 🐇

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jp/xmover-shard-hot-monitor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (1)
1-15: Remove commented-out code to satisfy ERA001 lint checks

CI linting shows remaining ERA001 errors in cratedb_toolkit/admin/xmover/analysis/shard.py. Please delete all commented-out shards_table.add_column(...) and related lines, then re-run the formatter and linter:

• File: cratedb_toolkit/admin/xmover/analysis/shard.py
– Remove lines 871–873:
python # shards_table.add_column("Global Checkpoint", style="magenta") # shards_table.add_column("Local Checkpoint", style="magenta") # shards_table.add_column("Hot Timestamp", style="magenta")
– Remove lines 898–900:
python # str(shard.seq_stats_global_checkpoint), # str(shard.seq_stats_local_checkpoint), # str(hot_timestamp) if shard.hot_timestamp else "-",

After cleanup, run:
ruff format cratedb_toolkit/admin/xmover/analysis/shard.py
ruff check --fix cratedb_toolkit/admin/xmover/analysis/shard.py
to confirm no lint errors remain.

🧹 Nitpick comments (1)

cratedb_toolkit/admin/xmover/cli.py (1)

65-73: Add watch/interval flags to align with PR objective (optional)

To deliver the “--watch-like real-time updating view,” consider adding --watch and --refresh-interval options (mirroring monitor_recovery) and pass them to ShardMonitor.monitor_shards(watch, refresh_interval). Otherwise, keep this as a one-shot snapshot and document accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5068671 and 0ed4a5f.

📒 Files selected for processing (4)

cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)
cratedb_toolkit/admin/xmover/cli.py (2 hunks)
cratedb_toolkit/admin/xmover/model.py (1 hunks)
cratedb_toolkit/admin/xmover/util/database.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

cratedb_toolkit/admin/xmover/analysis/shard.py

8-8: time.sleep imported but unused

Remove unused import: time.sleep

(F401)

871-871: Found commented-out code

Remove commented-out code

(ERA001)

872-872: Found commented-out code

Remove commented-out code

(ERA001)

873-873: Found commented-out code

Remove commented-out code

(ERA001)

898-898: Found commented-out code

Remove commented-out code

(ERA001)

899-899: Found commented-out code

Remove commented-out code

(ERA001)

900-900: Found commented-out code

Remove commented-out code

(ERA001)

🪛 GitHub Actions: Tests: Common

cratedb_toolkit/admin/xmover/analysis/shard.py

[error] 1-1: ruff format --check would reformat cratedb_toolkit/admin/xmover/analysis/shard.py. 1 file would be reformatted. Lint step 'lint[0]' aborted with exit code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-and-publish
GitHub Check: build-and-publish

🔇 Additional comments (4)

cratedb_toolkit/admin/xmover/model.py (1)

32-51: All ShardInfo instantiations have been updated—no further action required

Verified a single instantiation in cratedb_toolkit/admin/xmover/util/database.py (lines 166–183)
– All 16 parameters of the updated ShardInfo dataclass are passed by name and in the correct order.

cratedb_toolkit/admin/xmover/cli.py (1)

13-13: Import looks correct and aligns with new feature wiring
cratedb_toolkit/admin/xmover/util/database.py (2)
179-183: ShardInfo mapping expects Optional[int] after model change

Once ShardInfo types are Optional[int], these assignments are fine even if the query returns NULL. If you prefer strict ints, add COALESCE in SQL or convert here.

150-155: Refactor hot_timestamp extraction to avoid unsupported array lambdas

CrateDB does not support higher‐order functions like transform(array, lambda) nor array_max on arrays of objects—it only provides array_max(array) for primitive arrays (cratedb.com).

Instead, guard against out-of-bounds access with a length check (for example, using cardinality) and fall back to NULL when the second lease isn’t present. You can replace the direct index access:
-    s.routing_state,
-    s.seq_no_stats['max_seq_no'],
-    s.seq_no_stats['global_checkpoint'],
-    s.seq_no_stats['local_checkpoint'],
-    s.retention_leases['leases'][1]['timestamp']
+    s.routing_state,
+    s.seq_no_stats['max_seq_no']        AS max_seq_no,
+    s.seq_no_stats['global_checkpoint'] AS global_checkpoint,
+    s.seq_no_stats['local_checkpoint']  AS local_checkpoint,
+    CASE
+      WHEN cardinality(s.retention_leases['leases']) > 1
+      THEN s.retention_leases['leases'][1]['timestamp']
+      ELSE NULL
+    END AS hot_timestamp
This avoids runtime errors when leases has fewer than two elements (and still returns the second-element timestamp when available).

If you truly need the latest lease timestamp across all objects, you must either:

Flatten and aggregate in your application layer (since CrateDB lacks array lambdas), or

Define a custom UDF (e.g., in JavaScript) to compute the max timestamp on the server side.

Likely an incorrect or invalid review comment.

cratedb_toolkit/admin/xmover/analysis/shard.py

coderabbitai · 2025-08-21T20:05:14Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+    def __init__(self, analyzer: ShardAnalyzer):
+        self.analyzer = analyzer
+        self.shard_time_deltas: dict[str, int] = {}
+        self.shard_seq_deltas: dict[str, int] = {}
+


🛠️ Refactor suggestion

Store computed deltas and avoid key collisions across tables

The dictionaries are typed for int values, but hot deltas can be absent. Also, keying by node_id-shard_id risks collisions across tables. Make time deltas Optional[int] and use a compound key including schema+table.

Apply this diff:

class ShardMonitor: def __init__(self, analyzer: ShardAnalyzer): self.analyzer = analyzer - self.shard_time_deltas: dict[str, int] = {} - self.shard_seq_deltas: dict[str, int] = {} + self.shard_time_deltas: dict[str, Optional[int]] = {} + self.shard_seq_deltas: dict[str, int] = {}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 841 to 845, the two dicts currently typed as dict[str, int] and keyed only by node_id-shard_id can lose absent (hot) deltas and collide across different tables; change their type annotations to dict[str, Optional[int]] (or dict[tuple[str,str,str], Optional[int]] if you prefer tuple keys) to allow None for missing deltas, and change the keying to include schema and table (e.g. compound key using schema.table plus node_id and shard_id or a tuple of (schema, table, node_id, shard_id)) so entries are unique across tables; update all places that read/write these dicts to handle Optional[int] (check for None) and to construct/parse the new compound key consistently.

cratedb_toolkit/admin/xmover/analysis/shard.py

coderabbitai · 2025-08-21T20:05:15Z

cratedb_toolkit/admin/xmover/model.py

+    seq_stats_max_seq_no: int
+    seq_stats_global_checkpoint: int
+    seq_stats_local_checkpoint: int
+    hot_timestamp: int


🛠️ Refactor suggestion

Make new ShardInfo fields Optional with defaults to avoid None-related crashes and ease compatibility

seq_no_stats and retention_leases can be missing/null on some shards. With strict int typing and no defaults, construction will break and downstream code (e.g., datetime.fromtimestamp) can crash. Recommend Optional[int] with default None.

Apply this diff:

- seq_stats_max_seq_no: int - seq_stats_global_checkpoint: int - seq_stats_local_checkpoint: int - hot_timestamp: int + seq_stats_max_seq_no: Optional[int] = None + seq_stats_global_checkpoint: Optional[int] = None + seq_stats_local_checkpoint: Optional[int] = None + hot_timestamp: Optional[int] = None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

seq_stats_max_seq_no: int

seq_stats_global_checkpoint: int

seq_stats_local_checkpoint: int

hot_timestamp: int

seq_stats_max_seq_no: Optional[int] = None

seq_stats_global_checkpoint: Optional[int] = None

seq_stats_local_checkpoint: Optional[int] = None

hot_timestamp: Optional[int] = None

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/model.py around lines 48 to 51, the integer fields seq_stats_max_seq_no, seq_stats_global_checkpoint, seq_stats_local_checkpoint, and hot_timestamp must be made optional with defaults to avoid construction crashes when values are missing/null; change their type annotations to Optional[int] and give them a default of None, and ensure Optional is imported from typing (or typing import Optional is added at top); likewise, review nearby shard fields like seq_no_stats and retention_leases and apply Optional with default None where they can be absent.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (6)

cratedb_toolkit/admin/xmover/analysis/shard.py (6)
884-890: Make interval configurable and refresh both snapshots

Hardcoding 10 seconds and not refreshing the baseline snapshot may surprise users. Expose interval and always refresh both snapshots to get a consistent pair.

Apply this diff:
-    def monitor_shards(self):
-        max_seconds_old = 300
-
-        self.refresh_deltas(refresh_data=False)
-        sleep(10)
-        self.refresh_deltas()
+    def monitor_shards(self, interval_seconds: int = 10, max_seconds_old: int = 300):
+        # Take two fresh snapshots separated by interval_seconds
+        self.refresh_deltas(refresh_data=True)
+        if interval_seconds > 0:
+            sleep(interval_seconds)
+        self.refresh_deltas(refresh_data=True)
891-893: Sort by hottest (largest seq delta first) and wrap long line

Current sort is ascending and exceeds line length. Sort by delta desc to surface hot shards first and break lines.

Apply this diff:
-        shards: list[ShardInfo] = self.filter_shards(max_seconds_old)
-        sorted_shards: list[ShardInfo] = sorted(shards, key=lambda s: self.shard_seq_deltas[self._get_shard_compound_id(s)])
+        shards: list[ShardInfo] = self.filter_shards(max_seconds_old)
+        sorted_shards: list[ShardInfo] = sorted(
+            shards,
+            key=lambda s: self.shard_seq_deltas.get(self._get_shard_compound_id(s), 0),
+            reverse=True,
+        )
907-918: Render human-friendly age and guard missing deltas

Show "-" when age is unknown and add "s" suffix; guard seq delta lookup.

Apply this diff:
         for shard in sorted_shards:
             shard_compound_id = self._get_shard_compound_id(shard)
             shards_table.add_row(
                 shard.schema_name,
                 shard.table_name,
                 str(shard.shard_id),
                 shard.node_name,
                 str(shard.is_primary),
                 format_size(shard.size_gb),
-                str(self.shard_seq_deltas[shard_compound_id]),
-                str(self.shard_time_deltas[shard_compound_id]),
+                str(self.shard_seq_deltas.get(shard_compound_id, 0)),
+                (f"{self.shard_time_deltas.get(shard_compound_id)}s"
+                 if self.shard_time_deltas.get(shard_compound_id) is not None
+                 else "-"),
             )
849-849: Trim trailing whitespace on blank lines

These lines trigger W293 in Ruff. Remove stray spaces.

Also applies to: 922-922

863-865: Align model typing for hot_timestamp with runtime usage

This code treats hot_timestamp as Optional[int], but ShardInfo currently annotates it as int (see cratedb_toolkit/admin/xmover/model.py:32-54). If None is possible from sys.shards, update the dataclass to Optional[int] and ensure util/database populates None accordingly.

Would you like me to patch model.py and util/database.py to reflect Optional[int] and add unit tests covering None timestamps?

Also applies to: 878-883

884-923: Optional: add --watch loop using Rich Live for continuous mode

Given the PR goal, consider a watch loop (opt-in) that refreshes and re-renders at interval_seconds using rich.live.Live. This keeps discrete mode as default while enabling continuous monitoring via a flag wired in CLI.

If you want, I can draft the Live-based loop and a CLI flag (e.g., --watch/--interval).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0ed4a5f and 449a8a9.

📒 Files selected for processing (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (3)

cratedb_toolkit/admin/xmover/model.py (2)

ShardInfo (33-55)

size_gb (85-87)

cratedb_toolkit/admin/xmover/cli.py (1)

monitor_shards (67-72)

cratedb_toolkit/admin/xmover/util/format.py (1)

format_size (1-8)

🪛 Ruff (0.12.2)

cratedb_toolkit/admin/xmover/analysis/shard.py

849-849: Blank line contains whitespace

Remove whitespace from blank line

(W293)

867-867: Test for membership should be not in

Convert to not in

(E713)

870-870: Line too long (128 > 120)

(E501)

874-874: Variable id is shadowing a Python builtin

(A001)

892-892: Line too long (124 > 120)

(E501)

922-922: Blank line contains whitespace

Remove whitespace from blank line

(W293)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: build-and-publish
GitHub Check: build-and-test
GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
GitHub Check: CFR for OS ubuntu-latest
GitHub Check: CFR for OS macos-latest
GitHub Check: CFR for OS windows-latest
GitHub Check: CFR for OS macos-13

🔇 Additional comments (2)

cratedb_toolkit/admin/xmover/analysis/shard.py (2)

8-8: Resolved: sleep import now used by monitor_shards()

The prior lint warning about an unused import is addressed; sleep is used on Line 888. No action needed here.

840-921: ShardMonitor definitions, CLI flags, and hot_timestamp hydration all verified

Only one ShardMonitor class exists (in analysis/shard.py).

CLI exposes --watch/-w and --refresh-interval flags in cratedb_toolkit/admin/xmover/cli.py.

hot_timestamp is correctly read from the DB (row[15] in database.py), stored as an integer in the model, and converted from milliseconds to seconds in analysis/shard.py.

No further action required.

cratedb_toolkit/admin/xmover/analysis/shard.py

coderabbitai · 2025-08-21T20:23:21Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+    def _get_shard_compound_id(self, shard: ShardInfo) -> str:
+        return f"{shard.node_id}-{shard.shard_id}"
+


🛠️ Refactor suggestion

Compound shard key risks collisions across tables

Keying by node_id-shard_id can collide between tables sharing the same shard id on the same node. Include schema and table to guarantee uniqueness.

Apply this diff:

def _get_shard_compound_id(self, shard: ShardInfo) -> str: - return f"{shard.node_id}-{shard.shard_id}" + # schema.table:shard_id@node_id + return f"{shard.schema_name}.{shard.table_name}:{shard.shard_id}@{shard.node_id}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_shard_compound_id(self, shard: ShardInfo) -> str:

return f"{shard.node_id}-{shard.shard_id}"

def _get_shard_compound_id(self, shard: ShardInfo) -> str:

# schema.table:shard_id@node_id

return f"{shard.schema_name}.{shard.table_name}:{shard.shard_id}@{shard.node_id}"

🧰 Tools

🪛 Ruff (0.12.2)

849-849: Blank line contains whitespace

Remove whitespace from blank line

(W293)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 847-849 the compound key uses only node_id and shard_id which can collide for different tables on the same node; change the function to include schema and table identifiers (e.g. schema and table name) along with node_id and shard_id to guarantee uniqueness, and defensively handle missing attributes (use available schema/table fields or sensible fallbacks) so callers continue to work.

cratedb_toolkit/admin/xmover/analysis/shard.py

amotl · 2025-08-22T17:10:33Z

cratedb_toolkit/admin/xmover/cli.py

+@main.command()
+@click.pass_context
+def monitor_shards(ctx):
+    """Monitor shards, pointing out hot ones"""
+    client = ctx.obj["client"]
+    analyzer = ShardAnalyzer(client)
+    monitor = ShardMonitor(analyzer)
+    monitor.monitor_shards()


Just an idea about »naming things«: xmover monitor-shards certainly fits the bill, but what do you think about xmover shard-heat ¹ to better reflect its intentions?

Footnotes

... with an optional --watch option in the long run, if we can make the program also produce static one-shot reports in a future iteration? ↩

coderabbitai

Actionable comments posted: 10

♻️ Duplicate comments (1)

cratedb_toolkit/admin/xmover/cli.py (1)
65-65: Command naming: consider shard-heat for intent clarity (alias or rename)

Prior review suggested shard-heat as a better name. You can either rename or add an alias while keeping monitor-shards.

Option A (rename; simplest):
-@main.command(name="monitor-shards")
+@main.command(name="shard-heat")
Option B (add alias while keeping current name):
 @main.command(name="shard-heat")
 @click.option(... same options ...)
 def monitor_shards(...):
     ...
+
+# Register alias for backward-compat
+main.add_command(monitor_shards, name="monitor-shards")

🧹 Nitpick comments (6)

cratedb_toolkit/admin/xmover/cli.py (1)
72-78: Wire a --watch UX to match PR goals (optional live mode)

The PR mentions a watch-style mode; currently repeat=-1 simulates continuous mode but Live rendering toggle is hardcoded elsewhere. Consider adding a --watch/--no-watch flag and passing it through, so we can enable Rich Live when desired.

Proposed minimal CLI change (ShardsMonitor will need a matching parameter):
-@click.pass_context
-def monitor_shards(ctx, table, wait_time, repeat, max_results, sort_by):
+@click.option("--watch/--no-watch", default=False, help="Live-updating view")
+@click.pass_context
+def monitor_shards(ctx, table, wait_time, repeat, max_results, sort_by, watch):
     """Monitor shards, pointing out hot ones"""
     client = ctx.obj["client"]
     analyzer = ShardAnalyzer(client)
     monitor = ShardMonitor(analyzer)
-    monitor.monitor_shards(table, wait_time, repeat, max_results, sort_by)
+    monitor.monitor_shards(table, wait_time, repeat, max_results, sort_by, watch=watch)
If you want, I can follow up with the matching changes in ShardMonitor.monitor_shards.
cratedb_toolkit/admin/xmover/util/database.py (1)
178-181: Cast seq-no fields to int on ingestion to avoid Optional[int] surprises

Make types explicit and robust against None (post-COALESCE this is defensive, but keeps the Python side consistent).
-                    seq_stats_max_seq_no=row[12],
-                    seq_stats_global_checkpoint=row[13],
-                    seq_stats_local_checkpoint=row[14],
+                    seq_stats_max_seq_no=int(row[12] or 0),
+                    seq_stats_global_checkpoint=int(row[13] or 0),
+                    seq_stats_local_checkpoint=int(row[14] or 0),
cratedb_toolkit/admin/xmover/analysis/shard.py (4)
971-974: Wrap long sort line (Ruff E501) and guard against missing keys

Minor readability fix and added safety.
-            self.latest_shards = sorted(updated_shards, key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)],
-                                        reverse=True)
+            self.latest_shards = sorted(
+                updated_shards,
+                key=lambda s: self.seq_deltas.get(self._get_shard_compound_id(s), 0),
+                reverse=True,
+            )
900-913: Consider hiding DEBUG columns behind a flag (cleaner default UI)

The two DEBUG columns are useful during development but noisy for normal use. Gate them behind a debug flag or remove from default header.

Example approach:

Add a show_debug: bool = False attribute to ShardMonitor.

Conditionally add the debug columns and values.

883-893: Minor: Node heat sorting by name descending might be unexpected

Using reverse=True sorts both heat and node name descending. Often, descending by heat and ascending by name is preferred for stable alphabetical tie-breaks.
-        sorted_items = sorted(heat_nodes_info.items(), key=lambda kv: (kv[1], kv[0]),  reverse=True)
+        sorted_items = sorted(
+            heat_nodes_info.items(),
+            key=lambda kv: (-kv[1], kv[0])
+        )
842-849: State initialization is deferred; minimal guard note

Attributes like seq_deltas/size_deltas are assigned via refresh_data(). This is fine since they’re used only after that call in monitor_shards, but be mindful if other entrypoints get introduced. Optional: initialize to empty dicts for robustness.
-        self.reference_shards: dict[str, ShardInfo]
-        self.latest_shards: list[ShardInfo]
-        self.seq_deltas: dict[str, int]
-        self.size_deltas: dict[str, float]
+        self.reference_shards: dict[str, ShardInfo] = {}
+        self.latest_shards: list[ShardInfo] = []
+        self.seq_deltas: dict[str, int] = {}
+        self.size_deltas: dict[str, float] = {}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 449a8a9 and 2b2cfb6.

📒 Files selected for processing (4)

cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)
cratedb_toolkit/admin/xmover/cli.py (2 hunks)
cratedb_toolkit/admin/xmover/model.py (1 hunks)
cratedb_toolkit/admin/xmover/util/database.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

cratedb_toolkit/admin/xmover/model.py

🧰 Additional context used

🧬 Code graph analysis (2)

cratedb_toolkit/admin/xmover/analysis/shard.py (3)

cratedb_toolkit/admin/xmover/model.py (2)

ShardInfo (33-54)

size_gb (84-86)

cratedb_toolkit/admin/xmover/cli.py (1)

monitor_shards (72-77)

cratedb_toolkit/admin/xmover/util/format.py (1)

format_size (1-8)

cratedb_toolkit/admin/xmover/cli.py (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (3)

ShardAnalyzer (33-839)

ShardMonitor (842-991)

monitor_shards (853-881)

🪛 Ruff (0.12.2)

cratedb_toolkit/admin/xmover/analysis/shard.py

4-4: datetime imported but unused

Remove unused import: datetime

(F401)

10-10: unittest.result imported but unused

Remove unused import: unittest.result

(F401)

853-853: Line too long (146 > 120)

(E501)

864-864: Line too long (173 > 120)

(E501)

868-868: Line too long (135 > 120)

(E501)

874-874: Line too long (126 > 120)

(E501)

876-876: Line too long (122 > 120)

(E501)

968-968: Line too long (137 > 120)

(E501)

cratedb_toolkit/admin/xmover/cli.py

67-67: Line too long (188 > 120)

(E501)

68-68: Line too long (192 > 120)

(E501)

70-70: Line too long (140 > 120)

(E501)

🪛 GitHub Actions: Tests: Common

cratedb_toolkit/admin/xmover/analysis/shard.py

[error] 853-853: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when evaluating annotation 'str | None' in ShardMonitor.monitor_shards. Python 3.9 does not support the PEP 604 union operator. This occurred during 'pytest -m cfr' collection. Upgrade to Python 3.10+ or replace with 'Optional[str]' / 'Union[str, None]'.

cratedb_toolkit/admin/xmover/cli.py

[error] 1-1: Ruff format check detected formatting issues. This file would be reformatted by Ruff. Step: Poe check (lint[0]); exit code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: build-and-publish
GitHub Check: build-and-publish
GitHub Check: CFR for OS macos-latest
GitHub Check: CFR for OS ubuntu-latest
GitHub Check: CFR for OS windows-latest
GitHub Check: CFR for OS macos-13

🔇 Additional comments (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (1)
936-941: Verify schema_name on ShardInfo before updating compound ID
Including the schema in the shard‐ID string is a sound approach to prevent collisions when the same table name exists in multiple schemas. Before applying the refactor, please confirm:

• The ShardInfo type actually exposes a schema_name attribute.
• All callers of _get_shard_compound_id (e.g., map keys, logging, UI components) can handle the new format.

If schema_name is available and you update the method, your change might look like:
-    def _get_shard_compound_id(self, shard: ShardInfo) -> str:
-        if self.sort_by == 'node':
-            return f"{shard.node_name}-{shard.table_name}-{shard.shard_id}"
-        else:
-            return f"{shard.table_name}-{shard.shard_id}-{shard.node_name}"
+    def _get_shard_compound_id(self, shard: ShardInfo) -> str:
+        # include schema to avoid collisions across schemas
+        if self.sort_by == "node":
+            return f"{shard.node_name}-{shard.schema_name}.{shard.table_name}-{shard.shard_id}"
+        return f"{shard.schema_name}.{shard.table_name}-{shard.shard_id}-{shard.node_name}"
Please verify the presence of schema_name and update downstream usage as needed.

coderabbitai · 2025-08-25T11:22:24Z

cratedb_toolkit/admin/xmover/analysis/shard.py

 Shard analysis and rebalancing logic for CrateDB
 """
-
+import datetime


⚠️ Potential issue

Remove unused import to satisfy Ruff (F401)

datetime is unused in this module version.

-import datetime

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import datetime

🧰 Tools

🪛 Ruff (0.12.2)

4-4: datetime imported but unused

Remove unused import: datetime

(F401)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around line 4, the module currently imports datetime but does not use it; remove the unused import line "import datetime" to satisfy the Ruff F401 lint rule and keep imports minimal.

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/analysis/shard.py

 from collections import defaultdict
+from time import sleep
 from typing import Any, Dict, List, Optional, Set, Tuple, Union
+from unittest import result


⚠️ Potential issue

Remove unused unittest.result import (Ruff F401)

-from unittest import result

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from unittest import result

🧰 Tools

🪛 Ruff (0.12.2)

10-10: unittest.result imported but unused

Remove unused import: unittest.result

(F401)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around line 10, the import "from unittest import result" is unused and flagged by Ruff F401; remove that import line (or replace it with the correct needed import if the symbol was intended to be used) so the file no longer imports an unused name.

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+        self.table_filter: str|None = None
+        self.sort_by: str = 'heat'
+
+    def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'):


⚠️ Potential issue

Python 3.9 compatibility: replace PEP 604 unions with Optional[...]

Pipeline fails at runtime when evaluating str | None. Use Optional[str] to support Python 3.9.

- self.table_filter: str|None = None + from typing import Optional # if not already imported at top + self.table_filter: Optional[str] = None

And update the method signature:

-def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'): +def monitor_shards( + self, + table_filter: Optional[str], + interval_in_seconds: int = 5, + repeat: int = 10, + n_shards: int = 40, + sort_by: str = "heat", + watch: bool = False, # optional if implementing CLI watch flag +):

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.12.2)

853-853: Line too long (146 > 120)

(E501)

🪛 GitHub Actions: Tests: Common

[error] 853-853: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when evaluating annotation 'str | None' in ShardMonitor.monitor_shards. Python 3.9 does not support the PEP 604 union operator. This occurred during 'pytest -m cfr' collection. Upgrade to Python 3.10+ or replace with 'Optional[str]' / 'Union[str, None]'.

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 850 to 853, the code uses PEP 604 union types (str | None) which break on Python 3.9; change all occurrences to typing.Optional[str] (e.g., self.table_filter: Optional[str] = None and def monitor_shards(self, table_filter: Optional[str], ...)), and ensure you add/import "from typing import Optional" at the top of the file if not already present.

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+            with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards:
+                while True:
+                    sleep(interval_in_seconds)
+                    self.refresh_data()
+                    live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas))
+        else:


🛠️ Refactor suggestion

Wrap long Rich Live line(s) to pass Ruff E501 and optionally drive watch

Two lines exceed 120 chars. Also, go_live is hardcoded; consider using a watch parameter from the CLI (optional).

- go_live = False + go_live = watch # drive from CLI; default False if go_live: - with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards: + with Live( + self.generate_shards_table( + self._get_top_shards(self.latest_shards, n_shards), + self.seq_deltas, + ), + refresh_per_second=4, + console=console, + ) as live_shards: while True: sleep(interval_in_seconds) self.refresh_data() - live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas)) + live_shards.update( + self.generate_shards_table( + self._get_top_shards(self.latest_shards, n_shards), + self.seq_deltas, + ) + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards:

while True:

sleep(interval_in_seconds)

self.refresh_data()

live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas))

else:

go_live = watch # drive from CLI; default False

if go_live:

with Live(

self.generate_shards_table(

self._get_top_shards(self.latest_shards, n_shards),

self.seq_deltas,

),

refresh_per_second=4,

console=console,

) as live_shards:

while True:

sleep(interval_in_seconds)

self.refresh_data()

live_shards.update(

self.generate_shards_table(

self._get_top_shards(self.latest_shards, n_shards),

self.seq_deltas,

)

)

else:

# existing non-live rendering path

🧰 Tools

🪛 Ruff (0.12.2)

864-864: Line too long (173 > 120)

(E501)

868-868: Line too long (135 > 120)

(E501)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 864 to 869, two lines calling Live(..., refresh_per_second=4, console=console) and live_shards.update(...) exceed 120 characters; wrap the long Rich Live invocation and the update call to multiple shorter concatenated/indented lines to satisfy Ruff E501 (<=120 chars). Also replace the hardcoded go_live/watch behavior by wiring a CLI parameter (e.g., watch or go_live) into this branch and use that parameter instead of a literal so the visibility loop is controlled externally.

cratedb_toolkit/admin/xmover/analysis/shard.py

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+                    str(self.reference_shards[shard_compound_id].seq_stats_max_seq_no),
+                    str(shard.seq_stats_max_seq_no)
+                )
+        console.print(shards_table)


🛠️ Refactor suggestion

Avoid double-printing table (UI prints twice per iteration)

display_shards_table_rows prints the table, and then the caller prints it again. Remove the internal print to avoid duplicate output.

- console.print(shards_table)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

console.print(shards_table)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around line 934, the function display_shards_table_rows is printing the shards_table internally and the caller also prints it, causing duplicate UI output; remove the internal console.print(shards_table) call (or change the function to only return the table and let the caller handle printing) so the table is printed exactly once by the caller.

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+                refreshed_number = shard.seq_stats_max_seq_no
+                reference = reference_shards[shard_compound_id].seq_stats_max_seq_no
+
+                if refreshed_number < reference:
+                    refreshed_number += 2 ** 63 - 1
+
+                seq_result[shard_compound_id] = refreshed_number - reference
+                size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb


💡 Verification agent

🧩 Analysis chain

Defensive handling of None seq-no and rollover magic constant

Coalesce seq_stats_max_seq_no to 0 to avoid TypeError when the DB returns NULLs (e.g., non-STARTED shards).

The rollover adjustment uses 2 ** 63 - 1. Please confirm this matches CrateDB’s seq_no range; otherwise compute delta against a modulo base or reset reference.

- refreshed_number = shard.seq_stats_max_seq_no - reference = reference_shards[shard_compound_id].seq_stats_max_seq_no + refreshed_number = int(shard.seq_stats_max_seq_no or 0) + reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0) - if refreshed_number < reference: - refreshed_number += 2 ** 63 - 1 + # Handle rollover: if sequence decreased, assume wraparound. + # TODO: confirm expected max range for seq_no in CrateDB. + if refreshed_number < reference: + refreshed_number += (2**63 - 1)

🏁 Script executed:

#!/bin/bash # Locate references to seq_stats_max_seq_no and show context rg -nH -C3 'seq_stats_max_seq_no' .

Length of output: 2745

Improve seq-no handling: coalesce NULLs → 0 & correct wrap-around delta logic

To avoid runtime errors when seq_stats_max_seq_no is NULL (e.g. non-STARTED shards) and ensure the rollover fixup uses the full 64-bit modulus, please apply the following changes:

• In cratedb_toolkit/admin/xmover/util/database.py (row[12]) and the corresponding model, coalesce the DB value to 0 so downstream code never sees None and TypeErrors from comparisons or arithmetic.
• In cratedb_toolkit/admin/xmover/analysis/shard.py (lines 954–961), update the seq-no math to:

Cast to int with or 0 defaults

On wrap-around (current < reference), add 1 << 64 (the full unsigned 64-bit modulus), or use modular arithmetic ((refreshed - reference) % (1 << 64)) to get the correct delta.

Proposed diff:

- refreshed_number = shard.seq_stats_max_seq_no - reference = reference_shards[shard_compound_id].seq_stats_max_seq_no + refreshed_number = int(shard.seq_stats_max_seq_no or 0) + reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0) - if refreshed_number < reference: - refreshed_number += 2 ** 63 - 1 + if refreshed_number < reference: + # Handle signed-long overflow: wrap around using full 64-bit modulus + refreshed_number += 1 << 64 - seq_result[shard_compound_id] = refreshed_number - reference + # Or simply: seq_result[shard_compound_id] = (refreshed_number - reference) % (1 << 64) + seq_result[shard_compound_id] = refreshed_number - reference size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

refreshed_number = shard.seq_stats_max_seq_no

reference = reference_shards[shard_compound_id].seq_stats_max_seq_no

if refreshed_number < reference:

refreshed_number += 2 ** 63 - 1

seq_result[shard_compound_id] = refreshed_number - reference

size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb

refreshed_number = int(shard.seq_stats_max_seq_no or 0)

reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0)

if refreshed_number < reference:

# Handle signed-long overflow: wrap around using full 64-bit modulus

refreshed_number += 1 << 64

# Or simply: seq_result[shard_compound_id] = (refreshed_number - reference) % (1 << 64)

seq_result[shard_compound_id] = refreshed_number - reference

size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 954–961 and cratedb_toolkit/admin/xmover/util/database.py (row[12]) the code can encounter NULL seq_stats_max_seq_no and uses an incorrect wrap-around fixup; coalesce DB seq_no to 0 in database.py (and update the model) so callers never see None, and in shard.py cast seq values to int with a default of 0 and compute the delta using full 64-bit modular arithmetic (e.g. (refreshed - reference) % (1 << 64)) instead of adding 2**63 - 1 when refreshed < reference so wrap-around is handled correctly.

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+    def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
+        if n_shards < 1:
+            return sorted_shards[:n_shards]
+        else:
+            return sorted_shards


🛠️ Refactor suggestion

Top-N bug: slices applied only when n_shards < 1

This logic shows all shards for positive N and slices only for negative values. Invert the condition so positive n_shards returns the top-N.
[uggest_essential_refactor]

- def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]: - if n_shards < 1: - return sorted_shards[:n_shards] - else: - return sorted_shards + def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]: + if n_shards > 0: + return sorted_shards[:n_shards] + return sorted_shards

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:

if n_shards < 1:

return sorted_shards[:n_shards]

else:

return sorted_shards

def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:

if n_shards > 0:

return sorted_shards[:n_shards]

return sorted_shards

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 977 to 981, the condition is inverted so slices are applied only when n_shards is negative; change the logic to apply slicing when n_shards is positive: if n_shards > 0 return sorted_shards[:n_shards], else return sorted_shards (or return an empty list if you want n_shards == 0 to yield no shards).

coderabbitai · 2025-08-25T11:22:25Z

cratedb_toolkit/admin/xmover/cli.py

+@main.command()
+@click.option("--table", "-t", default=None, help="Analyze specific table only")
+@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)")
+@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)")
+@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)")
+@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)")
+@click.pass_context


🛠️ Refactor suggestion

⚠️ Potential issue

Fix Ruff E501: wrap long Click option lines and validate --sort-by choices

The three @click.option lines exceed the 120-char limit and break the pipeline. Also, constrain --sort-by to the supported values to catch typos early.

Apply this diff to wrap and validate:

-@main.command() -@click.option("--table", "-t", default=None, help="Analyze specific table only") -@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)") -@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)") -@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)") -@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)") +@main.command(name="monitor-shards") +@click.option("--table", "-t", default=None, help="Analyze specific table only") +@click.option( + "--wait-time", + "-w", + default=10, + help=( + "Seconds to wait between checks. " + "Longer waits yield more accurate deltas (default: 10)." + ), +) +@click.option( + "--repeat", + "-r", + default=1, + help=( + "Number of iterations to run. Use -1 for continuous monitoring " + "(default: 1)." + ), +) +@click.option( + "--max-results", + "-m", + default=40, + help="Maximum number of shards to display (default: 40)", +) +@click.option( + "--sort-by", + "-s", + type=click.Choice(["heat", "node", "table"]), + default="heat", + help="Sort by heat, node, or table (default: heat)", +)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@main.command()

@click.option("--table", "-t", default=None, help="Analyze specific table only")

@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)")

@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)")

@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)")

@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)")

@click.pass_context

@main.command(name="monitor-shards")

@click.option("--table", "-t", default=None, help="Analyze specific table only")

@click.option(

"--wait-time",

"-w",

default=10,

help=(

"Seconds to wait between checks. "

"Longer waits yield more accurate deltas (default: 10)."

),

)

@click.option(

"--repeat",

"-r",

default=1,

help=(

"Number of iterations to run. Use -1 for continuous monitoring "

"(default: 1)."

),

)

@click.option(

"--max-results",

"-m",

default=40,

help="Maximum number of shards to display (default: 40)",

)

@click.option(

"--sort-by",

"-s",

type=click.Choice(["heat", "node", "table"]),

default="heat",

help="Sort by heat, node, or table (default: heat)",

)

@click.pass_context

🧰 Tools

🪛 Ruff (0.12.2)

67-67: Line too long (188 > 120)

(E501)

68-68: Line too long (192 > 120)

(E501)

70-70: Line too long (140 > 120)

(E501)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/cli.py around lines 65 to 71, several @click.option decorators exceed the 120-char limit and the --sort-by option is not constrained; wrap the long option lines to keep them under 120 characters (split arguments across lines so each line is <=120 chars) and change the --sort-by option to use click.Choice(["heat","node","table"]) (with an appropriate default of "heat") so invalid values are rejected early; keep other option defaults and help text the same.

coderabbitai · 2025-08-25T11:22:26Z

cratedb_toolkit/admin/xmover/util/database.py

+            s.routing_state,
+            s.seq_no_stats['max_seq_no'],
+            s.seq_no_stats['global_checkpoint'],
+            s.seq_no_stats['local_checkpoint']
        FROM sys.shards s


💡 Verification agent

🧩 Analysis chain

Seq-no stats can be NULL for non-STARTED shards; coalesce at the source

When for_analysis=True, non-STARTED shards may return NULLs for seq_no_stats. Downstream code does arithmetic on these fields and will crash. Coalesce to 0 in SQL to guarantee numeric values.

- s.routing_state, - s.seq_no_stats['max_seq_no'], - s.seq_no_stats['global_checkpoint'], - s.seq_no_stats['local_checkpoint'] + s.routing_state, + COALESCE(s.seq_no_stats['max_seq_no'], 0), + COALESCE(s.seq_no_stats['global_checkpoint'], 0), + COALESCE(s.seq_no_stats['local_checkpoint'], 0)

🏁 Script executed:

#!/bin/bash set -eux # Locate the target file file=$(find . -type f -path "*cratedb_toolkit/admin/xmover/util/database.py" -print -quit) echo "Inspecting $file" # Display context around the SQL snippet sed -n '130,200p' "$file" # Find where for_analysis parameter is referenced in this file grep -n "for_analysis" "$file" # Search the entire repo for usage of max_seq_no to locate downstream arithmetic grep -R "max_seq_no" -n .

Length of output: 4061

🏁 Script executed:

#!/bin/bash set -eux # Display the function definition containing the seq_no_stats query file="./cratedb_toolkit/admin/xmover/util/database.py" echo "===== Part of database.py (lines 1-140) =====" sed -n '1,140p' "$file" # Display the ShardInfo model to see how seq_stats_* fields are defined echo "===== Part of model.py (lines 1-200) =====" sed -n '1,200p' cratedb_toolkit/admin/xmover/model.py

Length of output: 9969

Coalesce seq_no_stats at the source to avoid NULLs for non-STARTED shards

When for_analysis=True, the SQL will include shards in states other than STARTED, and s.seq_no_stats[...] can be NULL. Since downstream code assigns these directly to integer fields and performs arithmetic, those NULLs will cause runtime errors. Coalesce to 0 in the SELECT to guarantee non-null numeric values.

• File: cratedb_toolkit/admin/xmover/util/database.py
Lines: 146–150 (in the get_shards_info query)

Suggested diff:

s.routing_state, - s.seq_no_stats['max_seq_no'], - s.seq_no_stats['global_checkpoint'], - s.seq_no_stats['local_checkpoint'] + COALESCE(s.seq_no_stats['max_seq_no'], 0) AS max_seq_no, + COALESCE(s.seq_no_stats['global_checkpoint'], 0) AS global_checkpoint, + COALESCE(s.seq_no_stats['local_checkpoint'], 0) AS local_checkpoint

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

s.routing_state,

s.seq_no_stats['max_seq_no'],

s.seq_no_stats['global_checkpoint'],

s.seq_no_stats['local_checkpoint']

FROM sys.shards s

s.routing_state,

COALESCE(s.seq_no_stats['max_seq_no'], 0) AS max_seq_no,

COALESCE(s.seq_no_stats['global_checkpoint'], 0) AS global_checkpoint,

COALESCE(s.seq_no_stats['local_checkpoint'], 0) AS local_checkpoint

FROM sys.shards s

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/util/database.py around lines 146–150, the SELECT uses s.seq_no_stats['max_seq_no'], s.seq_no_stats['global_checkpoint'], and s.seq_no_stats['local_checkpoint'] directly which can be NULL when for_analysis=True (non-STARTED shards); update the query to COALESCE each of those expressions to 0 in the SELECT so the returned values are guaranteed non-null integers and downstream arithmetic/assignments won’t error.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (8)

cratedb_toolkit/admin/xmover/analysis/shard.py (8)
4-4: Remove unused import to fix Ruff F401

The datetime import is unused and flagged by static analysis.

8-8: Remove unused import to fix Ruff F401

The sleep import from time is unused and flagged by static analysis.

10-10: Remove unused import to fix Ruff F401

The unittest.result import is unused and flagged by static analysis.

922-922: Remove duplicate table printing

The function prints the table internally and the caller also prints it, causing duplicate output.

942-949: Fix potential None handling and rollover arithmetic

The code doesn't handle potential None values for seq_stats_max_seq_no and uses incorrect rollover arithmetic.

965-969: Fix inverted logic for top-N shard filtering

The condition is inverted - it shows all shards for positive N and slices only for negative values.

853-853: Fix Python 3.9 compatibility and line length

The method signature uses PEP 604 union syntax and exceeds line length limits.

Apply this diff to fix both issues:
-    def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'):
+    def monitor_shards(
+        self,
+        table_filter: Optional[str],
+        interval_in_seconds: int = 5,
+        repeat: int = 10,
+        n_shards: int = 40,
+        sort_by: str = 'heat',
+    ):
862-874: Implement proper watch functionality and fix infinite loop

The current implementation has several issues:

Infinite loop without proper exit condition for non-watch mode

Missing watch parameter functionality

Line length violations

Apply this diff to fix the monitoring logic:
         iterations = 0
+        go_live = False  # TODO: wire from CLI watch parameter
         while True:
             sleep(interval_in_seconds)
             self.refresh_data()
-            shards_table = self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas)
+            shards_table = self.generate_shards_table(
+                self._get_top_shards(self.latest_shards, n_shards), 
+                self.seq_deltas
+            )
             console.print(shards_table)
-            nodes_table = self.generate_nodes_table(self._get_nodes_heat_info(self.reference_shards, self.seq_deltas))
+            nodes_table = self.generate_nodes_table(
+                self._get_nodes_heat_info(self.reference_shards, self.seq_deltas)
+            )
             console.print(nodes_table)
 
             iterations += 1
             if 0 < repeat <= iterations:
                 break

🧹 Nitpick comments (2)

cratedb_toolkit/admin/xmover/analysis/shard.py (2)

956-956: Fix line length violation

The line exceeds the 120-character limit.

Apply this diff to wrap the long line:

-        updated_shards: list[ShardInfo] = [s for s in self.analyzer.shards if not self.table_filter or self.table_filter == s.table_name]
+        updated_shards: list[ShardInfo] = [
+            s for s in self.analyzer.shards 
+            if not self.table_filter or self.table_filter == s.table_name
+        ]

958-962: Fix line length violations in sorting logic

Multiple lines exceed the 120-character limit.

Apply this diff to wrap the long lines:

         if self.sort_by == 'heat':
-            self.latest_shards = sorted(updated_shards, key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)],
-                                        reverse=True)
+            self.latest_shards = sorted(
+                updated_shards, 
+                key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)],
+                reverse=True
+            )
         else:
-            self.latest_shards = sorted(updated_shards, key=lambda s: self._get_shard_compound_id(s))
+            self.latest_shards = sorted(
+                updated_shards, 
+                key=lambda s: self._get_shard_compound_id(s)
+            )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2b2cfb6 and 8b2ce40.

📒 Files selected for processing (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cratedb_toolkit/admin/xmover/analysis/shard.py (3)

cratedb_toolkit/admin/xmover/model.py (2)

ShardInfo (33-54)

size_gb (84-86)

cratedb_toolkit/admin/xmover/cli.py (1)

monitor_shards (72-77)

cratedb_toolkit/admin/xmover/util/format.py (1)

format_size (1-8)

🪛 Ruff (0.12.2)

cratedb_toolkit/admin/xmover/analysis/shard.py

4-4: datetime imported but unused

Remove unused import: datetime

(F401)

10-10: unittest.result imported but unused

Remove unused import: unittest.result

(F401)

16-16: rich.live.Live imported but unused

Remove unused import: rich.live.Live

(F401)

853-853: Line too long (146 > 120)

(E501)

866-866: Line too long (122 > 120)

(E501)

956-956: Line too long (137 > 120)

(E501)

🪛 GitHub Actions: Tests: Common

cratedb_toolkit/admin/xmover/analysis/shard.py

[error] 853-853: pytest -m pymongo failed during collection: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in ShardMonitor.monitor_shards signature. This uses Python 3.10+'s PEP 604 union syntax ('str | None'), not supported on Python 3.9. Replace with Optional[str] or upgrade Python.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-and-publish
GitHub Check: build-and-publish

coderabbitai · 2025-08-25T11:31:28Z

cratedb_toolkit/admin/xmover/analysis/shard.py

 from rich.console import Console
 from rich.panel import Panel
 from rich.table import Table
+from rich.live import Live


⚠️ Potential issue

Remove unused import to fix Ruff F401

The Live import from rich.live is unused and flagged by static analysis.

-from rich.live import Live

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from rich.live import Live

-from rich.live import Live

🧰 Tools

🪛 Ruff (0.12.2)

16-16: rich.live.Live imported but unused

Remove unused import: rich.live.Live

(F401)

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around line 16 the import "from rich.live import Live" is unused and triggers Ruff F401; remove that import statement (or, if Live is intended to be used later, use it where required) so the unused symbol is gone and re-run linters to confirm the warning is resolved.

coderabbitai · 2025-08-25T11:31:28Z

cratedb_toolkit/admin/xmover/analysis/shard.py

+class ShardMonitor:
+    def __init__(self, analyzer: ShardAnalyzer):
+        self.analyzer = analyzer
+        self.reference_shards: dict[str, ShardInfo]
+        self.latest_shards: list[ShardInfo]
+        self.seq_deltas: dict[str, int]
+        self.size_deltas: dict[str, float]
+
+        self.table_filter: str|None = None
+        self.sort_by: str = 'heat'
+


⚠️ Potential issue

Fix Python 3.9 compatibility and improve type annotations

The code uses PEP 604 union syntax (str | None) which breaks on Python 3.9 and has missing type annotations for instance variables.

Apply this diff to fix compatibility and type annotations:

+from typing import Optional + class ShardMonitor: def __init__(self, analyzer: ShardAnalyzer): self.analyzer = analyzer - self.reference_shards: dict[str, ShardInfo] - self.latest_shards: list[ShardInfo] - self.seq_deltas: dict[str, int] - self.size_deltas: dict[str, float] + self.reference_shards: dict[str, ShardInfo] = {} + self.latest_shards: list[ShardInfo] = [] + self.seq_deltas: dict[str, int] = {} + self.size_deltas: dict[str, float] = {} - self.table_filter: str|None = None + self.table_filter: Optional[str] = None self.sort_by: str = 'heat'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class ShardMonitor:

def __init__(self, analyzer: ShardAnalyzer):

self.analyzer = analyzer

self.reference_shards: dict[str, ShardInfo]

self.latest_shards: list[ShardInfo]

self.seq_deltas: dict[str, int]

self.size_deltas: dict[str, float]

self.table_filter: str|None = None

self.sort_by: str = 'heat'

from typing import Optional

class ShardMonitor:

def __init__(self, analyzer: ShardAnalyzer):

self.analyzer = analyzer

self.reference_shards: dict[str, ShardInfo] = {}

self.latest_shards: list[ShardInfo] = []

self.seq_deltas: dict[str, int] = {}

self.size_deltas: dict[str, float] = {}

self.table_filter: Optional[str] = None

self.sort_by: str = 'heat'

🤖 Prompt for AI Agents

In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 842-852, the class uses PEP 604 union syntax and bare built-in generics which break on Python 3.9 and leaves instance attributes uninitialized/unannotated; change type hints to use typing imports (from typing import Dict, List, Optional) and annotate/initialize the attributes, e.g. reference_shards: Dict[str, ShardInfo] = {} , latest_shards: List[ShardInfo] = [], seq_deltas: Dict[str, int] = {}, size_deltas: Dict[str, float] = {}, and table_filter: Optional[str] = None, keeping sort_by: str = 'heat'. Ensure typing imports are added at top of file.

cratedb_toolkit/admin/xmover/analysis/shard.py

WalBeh and others added 17 commits August 21, 2025 02:13

Admin/XMover: Add XMover - CrateDB Shard Analyzer and Movement Tool

a9b9771

Admin/XMover: Code formatting and linting

70a3bf6

Admin/XMover: Type checking

4efd1ed

Admin/XMover: Implement suggestions by CodeRabbit

163001e

Admin/XMover: Add software tests

de9ba97

Admin/XMover: Refactor -- "recovery"

0d7fcb5

Admin/XMover: Refactor -- "recommender"

99f941d

Admin/XMover: Refactor -- "reporter"

2dc5b91

Admin/XMover: Refactor -- analyze -- shard vs. zone

745b23a

Admin/XMover: Refactor -- tune

d2cff3f

Admin/XMover: Refactor -- util

a91c8a1

Admin/XMover: Naming things -- analyze vs. operational

2fb889d

Admin/XMover: Naming things. OO.

d009652

Admin/XMover: Naming things. s/recover/monitor/

eb7aa7d

Admin/XMover: Suppress SSL warnings when SSL verification is disabled

46e9c00

Admin/XMover: Add shard distribution analysis for (large) tables

3704c96

amotl mentioned this pull request Aug 21, 2025

Next steps after XMover #526

Open

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

amotl changed the title ~~XMover: Real-time shard monitoring~~ XMover: Shard monitoring Aug 21, 2025

amotl changed the title ~~XMover: Shard monitoring~~ Admin/XMover: Shard hotness monitoring, optionally watching Aug 21, 2025

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

amotl commented Aug 22, 2025

View reviewed changes

amotl changed the title ~~Admin/XMover: Shard hotness monitoring, optionally watching~~ Admin/XMover: Shard heat monitoring, optionally watching Aug 22, 2025

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

WalBeh and others added 3 commits September 5, 2025 11:32

Admin/XMover: Add module for active shard monitoring

256ff3d

WIP: Some silly numbers

eb24a50

Clean up. Some early results can be seen already

8e1092f

juanpardo added 6 commits September 10, 2025 17:23

It now displays live the 25 hottest shards

2c4e648

Cleaned up the ui

cab4595

Fixed a bug and cleaned up the code. Added parameters

97c6305

Rearranged methods by importance, dropped debug display values

d3f45ef

fixup! Rearranged methods by importance, dropped debug display values

711f71a

Cleaned up some display messaging

92e1d81

juanpardo force-pushed the jp/xmover-shard-hot-monitor branch from 2d98ae3 to 92e1d81 Compare September 10, 2025 15:31

amotl force-pushed the xmover branch 2 times, most recently from dac2a6d to cb05b4d Compare September 10, 2025 23:21

amotl force-pushed the xmover branch 5 times, most recently from 3b3942f to 18c7134 Compare September 23, 2025 21:24

amotl mentioned this pull request Sep 23, 2025

Looking-glass utilities for exploring what's going on: CrateDB shard analysis and beyond #513

Open

		def _get_shard_compound_id(self, shard: ShardInfo) -> str:
		return f"{shard.node_id}-{shard.shard_id}"

Admin/XMover: Shard heat monitoring, optionally watching #527

Are you sure you want to change the base?

Admin/XMover: Shard heat monitoring, optionally watching #527

Uh oh!

Conversation

amotl commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Install

Details

Uh oh!

coderabbitai bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotl Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

amotl commented Aug 21, 2025 •

edited

Loading

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

amotl Aug 22, 2025 •

edited

Loading