Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Aug 21, 2025

About

At GH-526, we are thinking about next steps in user ergonomics when debugging a busy cluster. Based on the idea of computing deltas of certain metrics available through CrateDB's system tables about shards, @juanpardo starts exploring a few ideas here. Thanks!

Install

uv pip install --upgrade 'cratedb-toolkit @ git+https://github.com/crate/cratedb-toolkit.git@jp/xmover-shard-hot-monitor'

Details

Static reports are always topmost priority. In an optional variant, a --watch-like interval-updating view could also make sense. However, it is difficult to learn about historic progress/changes in this case, i.e. when leaving the screen for a while, so discrete output is always preferred as the primary operation mode.

@amotl amotl mentioned this pull request Aug 21, 2025
Copy link

coderabbitai bot commented Aug 21, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a ShardMonitor class and CLI subcommand to periodically monitor hot shards (computing seq/size deltas and rendering tables), extends ShardInfo with three seq_no_stats fields, updates DB queries to return those fields, and introduces duplicate ShardMonitor declarations.

Changes

Cohort / File(s) Summary of changes
Monitoring feature (UI & CLI)
cratedb_toolkit/admin/xmover/analysis/shard.py, cratedb_toolkit/admin/xmover/cli.py
Adds ShardMonitor (init, monitor_shards, refresh, delta calculations, UI table generation) and a CLI monitor_shards subcommand; monitoring loops with interval/repeat and live rendering. Note: ShardMonitor is declared twice in the same module.
Data model update
cratedb_toolkit/admin/xmover/model.py
Adds three int fields to ShardInfo: seq_stats_max_seq_no, seq_stats_global_checkpoint, seq_stats_local_checkpoint (no defaults; changes dataclass signature).
Database query changes
cratedb_toolkit/admin/xmover/util/database.py
get_shards_info selects three additional s.seq_no_stats[...] columns and maps them into the new ShardInfo fields; row shape extended.
Imports / utilities
cratedb_toolkit/admin/xmover/analysis/shard.py (imports)
Adds datetime, sleep (from time), Live (from rich.live) and uses format_size/delta mappings for UI. Patch includes an extraneous unittest.result import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as xmover CLI
  participant Analyzer as ShardAnalyzer
  participant DB as CrateDB
  participant Monitor as ShardMonitor
  participant UI as Rich Console

  User->>CLI: run "xmover monitor_shards"
  CLI->>Analyzer: construct(client)
  CLI->>Monitor: construct(analyzer)
  loop polling
    Monitor->>Analyzer: refresh shards
    Analyzer->>DB: SELECT ... (+ seq_no_stats columns)
    DB-->>Analyzer: rows
    Analyzer-->>Monitor: list[ShardInfo]
    Monitor->>Monitor: compute seq/size deltas, update reference map
    Monitor->>UI: render shards table and nodes heat panel
    Monitor->>Monitor: sleep / repeat
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers at the stream,
Hot shards shimmer, numbers gleam,
Deltas dance across each node,
I hop to watch the monitoring road.
Tiny rabbit, metrics in steam 🐇

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jp/xmover-shard-hot-monitor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (1)

1-15: Remove commented-out code to satisfy ERA001 lint checks

CI linting shows remaining ERA001 errors in cratedb_toolkit/admin/xmover/analysis/shard.py. Please delete all commented-out shards_table.add_column(...) and related lines, then re-run the formatter and linter:

• File: cratedb_toolkit/admin/xmover/analysis/shard.py
– Remove lines 871–873:
python # shards_table.add_column("Global Checkpoint", style="magenta") # shards_table.add_column("Local Checkpoint", style="magenta") # shards_table.add_column("Hot Timestamp", style="magenta")
– Remove lines 898–900:
python # str(shard.seq_stats_global_checkpoint), # str(shard.seq_stats_local_checkpoint), # str(hot_timestamp) if shard.hot_timestamp else "-",

After cleanup, run:

ruff format cratedb_toolkit/admin/xmover/analysis/shard.py
ruff check --fix cratedb_toolkit/admin/xmover/analysis/shard.py

to confirm no lint errors remain.

🧹 Nitpick comments (1)
cratedb_toolkit/admin/xmover/cli.py (1)

65-73: Add watch/interval flags to align with PR objective (optional)

To deliver the “--watch-like real-time updating view,” consider adding --watch and --refresh-interval options (mirroring monitor_recovery) and pass them to ShardMonitor.monitor_shards(watch, refresh_interval). Otherwise, keep this as a one-shot snapshot and document accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5068671 and 0ed4a5f.

📒 Files selected for processing (4)
  • cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)
  • cratedb_toolkit/admin/xmover/cli.py (2 hunks)
  • cratedb_toolkit/admin/xmover/model.py (1 hunks)
  • cratedb_toolkit/admin/xmover/util/database.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py

8-8: time.sleep imported but unused

Remove unused import: time.sleep

(F401)


871-871: Found commented-out code

Remove commented-out code

(ERA001)


872-872: Found commented-out code

Remove commented-out code

(ERA001)


873-873: Found commented-out code

Remove commented-out code

(ERA001)


898-898: Found commented-out code

Remove commented-out code

(ERA001)


899-899: Found commented-out code

Remove commented-out code

(ERA001)


900-900: Found commented-out code

Remove commented-out code

(ERA001)

🪛 GitHub Actions: Tests: Common
cratedb_toolkit/admin/xmover/analysis/shard.py

[error] 1-1: ruff format --check would reformat cratedb_toolkit/admin/xmover/analysis/shard.py. 1 file would be reformatted. Lint step 'lint[0]' aborted with exit code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-and-publish
  • GitHub Check: build-and-publish
🔇 Additional comments (4)
cratedb_toolkit/admin/xmover/model.py (1)

32-51: All ShardInfo instantiations have been updated—no further action required

  • Verified a single instantiation in cratedb_toolkit/admin/xmover/util/database.py (lines 166–183)
    – All 16 parameters of the updated ShardInfo dataclass are passed by name and in the correct order.
cratedb_toolkit/admin/xmover/cli.py (1)

13-13: Import looks correct and aligns with new feature wiring

cratedb_toolkit/admin/xmover/util/database.py (2)

179-183: ShardInfo mapping expects Optional[int] after model change

Once ShardInfo types are Optional[int], these assignments are fine even if the query returns NULL. If you prefer strict ints, add COALESCE in SQL or convert here.


150-155: Refactor hot_timestamp extraction to avoid unsupported array lambdas

CrateDB does not support higher‐order functions like transform(array, lambda) nor array_max on arrays of objects—it only provides array_max(array) for primitive arrays (cratedb.com).

Instead, guard against out-of-bounds access with a length check (for example, using cardinality) and fall back to NULL when the second lease isn’t present. You can replace the direct index access:

-    s.routing_state,
-    s.seq_no_stats['max_seq_no'],
-    s.seq_no_stats['global_checkpoint'],
-    s.seq_no_stats['local_checkpoint'],
-    s.retention_leases['leases'][1]['timestamp']
+    s.routing_state,
+    s.seq_no_stats['max_seq_no']        AS max_seq_no,
+    s.seq_no_stats['global_checkpoint'] AS global_checkpoint,
+    s.seq_no_stats['local_checkpoint']  AS local_checkpoint,
+    CASE
+      WHEN cardinality(s.retention_leases['leases']) > 1
+      THEN s.retention_leases['leases'][1]['timestamp']
+      ELSE NULL
+    END AS hot_timestamp

This avoids runtime errors when leases has fewer than two elements (and still returns the second-element timestamp when available).

If you truly need the latest lease timestamp across all objects, you must either:

  • Flatten and aggregate in your application layer (since CrateDB lacks array lambdas), or
  • Define a custom UDF (e.g., in JavaScript) to compute the max timestamp on the server side.

Likely an incorrect or invalid review comment.

Comment on lines 841 to 857
def __init__(self, analyzer: ShardAnalyzer):
self.analyzer = analyzer
self.shard_time_deltas: dict[str, int] = {}
self.shard_seq_deltas: dict[str, int] = {}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Store computed deltas and avoid key collisions across tables

The dictionaries are typed for int values, but hot deltas can be absent. Also, keying by node_id-shard_id risks collisions across tables. Make time deltas Optional[int] and use a compound key including schema+table.

Apply this diff:

 class ShardMonitor:
     def __init__(self, analyzer: ShardAnalyzer):
         self.analyzer = analyzer
-        self.shard_time_deltas: dict[str, int] = {}
-        self.shard_seq_deltas: dict[str, int] = {}
+        self.shard_time_deltas: dict[str, Optional[int]] = {}
+        self.shard_seq_deltas: dict[str, int] = {}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 841 to 845, the
two dicts currently typed as dict[str, int] and keyed only by node_id-shard_id
can lose absent (hot) deltas and collide across different tables; change their
type annotations to dict[str, Optional[int]] (or dict[tuple[str,str,str],
Optional[int]] if you prefer tuple keys) to allow None for missing deltas, and
change the keying to include schema and table (e.g. compound key using
schema.table plus node_id and shard_id or a tuple of (schema, table, node_id,
shard_id)) so entries are unique across tables; update all places that
read/write these dicts to handle Optional[int] (check for None) and to
construct/parse the new compound key consistently.

Comment on lines 48 to 51
seq_stats_max_seq_no: int
seq_stats_global_checkpoint: int
seq_stats_local_checkpoint: int
hot_timestamp: int
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make new ShardInfo fields Optional with defaults to avoid None-related crashes and ease compatibility

seq_no_stats and retention_leases can be missing/null on some shards. With strict int typing and no defaults, construction will break and downstream code (e.g., datetime.fromtimestamp) can crash. Recommend Optional[int] with default None.

Apply this diff:

-    seq_stats_max_seq_no: int
-    seq_stats_global_checkpoint: int
-    seq_stats_local_checkpoint: int
-    hot_timestamp: int
+    seq_stats_max_seq_no: Optional[int] = None
+    seq_stats_global_checkpoint: Optional[int] = None
+    seq_stats_local_checkpoint: Optional[int] = None
+    hot_timestamp: Optional[int] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
seq_stats_max_seq_no: int
seq_stats_global_checkpoint: int
seq_stats_local_checkpoint: int
hot_timestamp: int
seq_stats_max_seq_no: Optional[int] = None
seq_stats_global_checkpoint: Optional[int] = None
seq_stats_local_checkpoint: Optional[int] = None
hot_timestamp: Optional[int] = None
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/model.py around lines 48 to 51, the integer
fields seq_stats_max_seq_no, seq_stats_global_checkpoint,
seq_stats_local_checkpoint, and hot_timestamp must be made optional with
defaults to avoid construction crashes when values are missing/null; change
their type annotations to Optional[int] and give them a default of None, and
ensure Optional is imported from typing (or typing import Optional is added at
top); likewise, review nearby shard fields like seq_no_stats and
retention_leases and apply Optional with default None where they can be absent.

@amotl amotl changed the title XMover: Real-time shard monitoring XMover: Shard monitoring Aug 21, 2025
@amotl amotl changed the title XMover: Shard monitoring Admin/XMover: Shard hotness monitoring, optionally watching Aug 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
cratedb_toolkit/admin/xmover/analysis/shard.py (6)

884-890: Make interval configurable and refresh both snapshots

Hardcoding 10 seconds and not refreshing the baseline snapshot may surprise users. Expose interval and always refresh both snapshots to get a consistent pair.

Apply this diff:

-    def monitor_shards(self):
-        max_seconds_old = 300
-
-        self.refresh_deltas(refresh_data=False)
-        sleep(10)
-        self.refresh_deltas()
+    def monitor_shards(self, interval_seconds: int = 10, max_seconds_old: int = 300):
+        # Take two fresh snapshots separated by interval_seconds
+        self.refresh_deltas(refresh_data=True)
+        if interval_seconds > 0:
+            sleep(interval_seconds)
+        self.refresh_deltas(refresh_data=True)

891-893: Sort by hottest (largest seq delta first) and wrap long line

Current sort is ascending and exceeds line length. Sort by delta desc to surface hot shards first and break lines.

Apply this diff:

-        shards: list[ShardInfo] = self.filter_shards(max_seconds_old)
-        sorted_shards: list[ShardInfo] = sorted(shards, key=lambda s: self.shard_seq_deltas[self._get_shard_compound_id(s)])
+        shards: list[ShardInfo] = self.filter_shards(max_seconds_old)
+        sorted_shards: list[ShardInfo] = sorted(
+            shards,
+            key=lambda s: self.shard_seq_deltas.get(self._get_shard_compound_id(s), 0),
+            reverse=True,
+        )

907-918: Render human-friendly age and guard missing deltas

Show "-" when age is unknown and add "s" suffix; guard seq delta lookup.

Apply this diff:

         for shard in sorted_shards:
             shard_compound_id = self._get_shard_compound_id(shard)
             shards_table.add_row(
                 shard.schema_name,
                 shard.table_name,
                 str(shard.shard_id),
                 shard.node_name,
                 str(shard.is_primary),
                 format_size(shard.size_gb),
-                str(self.shard_seq_deltas[shard_compound_id]),
-                str(self.shard_time_deltas[shard_compound_id]),
+                str(self.shard_seq_deltas.get(shard_compound_id, 0)),
+                (f"{self.shard_time_deltas.get(shard_compound_id)}s"
+                 if self.shard_time_deltas.get(shard_compound_id) is not None
+                 else "-"),
             )

849-849: Trim trailing whitespace on blank lines

These lines trigger W293 in Ruff. Remove stray spaces.

Also applies to: 922-922


863-865: Align model typing for hot_timestamp with runtime usage

This code treats hot_timestamp as Optional[int], but ShardInfo currently annotates it as int (see cratedb_toolkit/admin/xmover/model.py:32-54). If None is possible from sys.shards, update the dataclass to Optional[int] and ensure util/database populates None accordingly.

Would you like me to patch model.py and util/database.py to reflect Optional[int] and add unit tests covering None timestamps?

Also applies to: 878-883


884-923: Optional: add --watch loop using Rich Live for continuous mode

Given the PR goal, consider a watch loop (opt-in) that refreshes and re-renders at interval_seconds using rich.live.Live. This keeps discrete mode as default while enabling continuous monitoring via a flag wired in CLI.

If you want, I can draft the Live-based loop and a CLI flag (e.g., --watch/--interval).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0ed4a5f and 449a8a9.

📒 Files selected for processing (1)
  • cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
cratedb_toolkit/admin/xmover/model.py (2)
  • ShardInfo (33-55)
  • size_gb (85-87)
cratedb_toolkit/admin/xmover/cli.py (1)
  • monitor_shards (67-72)
cratedb_toolkit/admin/xmover/util/format.py (1)
  • format_size (1-8)
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py

849-849: Blank line contains whitespace

Remove whitespace from blank line

(W293)


867-867: Test for membership should be not in

Convert to not in

(E713)


870-870: Line too long (128 > 120)

(E501)


874-874: Variable id is shadowing a Python builtin

(A001)


892-892: Line too long (124 > 120)

(E501)


922-922: Blank line contains whitespace

Remove whitespace from blank line

(W293)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-publish
  • GitHub Check: build-and-test
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (2)
cratedb_toolkit/admin/xmover/analysis/shard.py (2)

8-8: Resolved: sleep import now used by monitor_shards()

The prior lint warning about an unused import is addressed; sleep is used on Line 888. No action needed here.


840-921: ShardMonitor definitions, CLI flags, and hot_timestamp hydration all verified

  • Only one ShardMonitor class exists (in analysis/shard.py).
  • CLI exposes --watch/-w and --refresh-interval flags in cratedb_toolkit/admin/xmover/cli.py.
  • hot_timestamp is correctly read from the DB (row[15] in database.py), stored as an integer in the model, and converted from milliseconds to seconds in analysis/shard.py.

No further action required.

Comment on lines 847 to 849
def _get_shard_compound_id(self, shard: ShardInfo) -> str:
return f"{shard.node_id}-{shard.shard_id}"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Compound shard key risks collisions across tables

Keying by node_id-shard_id can collide between tables sharing the same shard id on the same node. Include schema and table to guarantee uniqueness.

Apply this diff:

     def _get_shard_compound_id(self, shard: ShardInfo) -> str:
-        return f"{shard.node_id}-{shard.shard_id}"
+        # schema.table:shard_id@node_id
+        return f"{shard.schema_name}.{shard.table_name}:{shard.shard_id}@{shard.node_id}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_shard_compound_id(self, shard: ShardInfo) -> str:
return f"{shard.node_id}-{shard.shard_id}"
def _get_shard_compound_id(self, shard: ShardInfo) -> str:
# schema.table:shard_id@node_id
return f"{shard.schema_name}.{shard.table_name}:{shard.shard_id}@{shard.node_id}"
🧰 Tools
🪛 Ruff (0.12.2)

849-849: Blank line contains whitespace

Remove whitespace from blank line

(W293)

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 847-849 the
compound key uses only node_id and shard_id which can collide for different
tables on the same node; change the function to include schema and table
identifiers (e.g. schema and table name) along with node_id and shard_id to
guarantee uniqueness, and defensively handle missing attributes (use available
schema/table fields or sensible fallbacks) so callers continue to work.

Comment on lines 65 to 72
@main.command()
@click.pass_context
def monitor_shards(ctx):
"""Monitor shards, pointing out hot ones"""
client = ctx.obj["client"]
analyzer = ShardAnalyzer(client)
monitor = ShardMonitor(analyzer)
monitor.monitor_shards()
Copy link
Member Author

@amotl amotl Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea about »naming things«: xmover monitor-shards certainly fits the bill, but what do you think about xmover shard-heat 1 to better reflect its intentions?

Footnotes

  1. ... with an optional --watch option in the long run, if we can make the program also produce static one-shot reports in a future iteration?

@amotl amotl changed the title Admin/XMover: Shard hotness monitoring, optionally watching Admin/XMover: Shard heat monitoring, optionally watching Aug 22, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (1)
cratedb_toolkit/admin/xmover/cli.py (1)

65-65: Command naming: consider shard-heat for intent clarity (alias or rename)

Prior review suggested shard-heat as a better name. You can either rename or add an alias while keeping monitor-shards.

Option A (rename; simplest):

-@main.command(name="monitor-shards")
+@main.command(name="shard-heat")

Option B (add alias while keeping current name):

 @main.command(name="shard-heat")
 @click.option(... same options ...)
 def monitor_shards(...):
     ...
+
+# Register alias for backward-compat
+main.add_command(monitor_shards, name="monitor-shards")
🧹 Nitpick comments (6)
cratedb_toolkit/admin/xmover/cli.py (1)

72-78: Wire a --watch UX to match PR goals (optional live mode)

The PR mentions a watch-style mode; currently repeat=-1 simulates continuous mode but Live rendering toggle is hardcoded elsewhere. Consider adding a --watch/--no-watch flag and passing it through, so we can enable Rich Live when desired.

Proposed minimal CLI change (ShardsMonitor will need a matching parameter):

-@click.pass_context
-def monitor_shards(ctx, table, wait_time, repeat, max_results, sort_by):
+@click.option("--watch/--no-watch", default=False, help="Live-updating view")
+@click.pass_context
+def monitor_shards(ctx, table, wait_time, repeat, max_results, sort_by, watch):
     """Monitor shards, pointing out hot ones"""
     client = ctx.obj["client"]
     analyzer = ShardAnalyzer(client)
     monitor = ShardMonitor(analyzer)
-    monitor.monitor_shards(table, wait_time, repeat, max_results, sort_by)
+    monitor.monitor_shards(table, wait_time, repeat, max_results, sort_by, watch=watch)

If you want, I can follow up with the matching changes in ShardMonitor.monitor_shards.

cratedb_toolkit/admin/xmover/util/database.py (1)

178-181: Cast seq-no fields to int on ingestion to avoid Optional[int] surprises

Make types explicit and robust against None (post-COALESCE this is defensive, but keeps the Python side consistent).

-                    seq_stats_max_seq_no=row[12],
-                    seq_stats_global_checkpoint=row[13],
-                    seq_stats_local_checkpoint=row[14],
+                    seq_stats_max_seq_no=int(row[12] or 0),
+                    seq_stats_global_checkpoint=int(row[13] or 0),
+                    seq_stats_local_checkpoint=int(row[14] or 0),
cratedb_toolkit/admin/xmover/analysis/shard.py (4)

971-974: Wrap long sort line (Ruff E501) and guard against missing keys

Minor readability fix and added safety.

-            self.latest_shards = sorted(updated_shards, key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)],
-                                        reverse=True)
+            self.latest_shards = sorted(
+                updated_shards,
+                key=lambda s: self.seq_deltas.get(self._get_shard_compound_id(s), 0),
+                reverse=True,
+            )

900-913: Consider hiding DEBUG columns behind a flag (cleaner default UI)

The two DEBUG columns are useful during development but noisy for normal use. Gate them behind a debug flag or remove from default header.

Example approach:

  • Add a show_debug: bool = False attribute to ShardMonitor.
  • Conditionally add the debug columns and values.

883-893: Minor: Node heat sorting by name descending might be unexpected

Using reverse=True sorts both heat and node name descending. Often, descending by heat and ascending by name is preferred for stable alphabetical tie-breaks.

-        sorted_items = sorted(heat_nodes_info.items(), key=lambda kv: (kv[1], kv[0]),  reverse=True)
+        sorted_items = sorted(
+            heat_nodes_info.items(),
+            key=lambda kv: (-kv[1], kv[0])
+        )

842-849: State initialization is deferred; minimal guard note

Attributes like seq_deltas/size_deltas are assigned via refresh_data(). This is fine since they’re used only after that call in monitor_shards, but be mindful if other entrypoints get introduced. Optional: initialize to empty dicts for robustness.

-        self.reference_shards: dict[str, ShardInfo]
-        self.latest_shards: list[ShardInfo]
-        self.seq_deltas: dict[str, int]
-        self.size_deltas: dict[str, float]
+        self.reference_shards: dict[str, ShardInfo] = {}
+        self.latest_shards: list[ShardInfo] = []
+        self.seq_deltas: dict[str, int] = {}
+        self.size_deltas: dict[str, float] = {}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 449a8a9 and 2b2cfb6.

📒 Files selected for processing (4)
  • cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)
  • cratedb_toolkit/admin/xmover/cli.py (2 hunks)
  • cratedb_toolkit/admin/xmover/model.py (1 hunks)
  • cratedb_toolkit/admin/xmover/util/database.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cratedb_toolkit/admin/xmover/model.py
🧰 Additional context used
🧬 Code graph analysis (2)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
cratedb_toolkit/admin/xmover/model.py (2)
  • ShardInfo (33-54)
  • size_gb (84-86)
cratedb_toolkit/admin/xmover/cli.py (1)
  • monitor_shards (72-77)
cratedb_toolkit/admin/xmover/util/format.py (1)
  • format_size (1-8)
cratedb_toolkit/admin/xmover/cli.py (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
  • ShardAnalyzer (33-839)
  • ShardMonitor (842-991)
  • monitor_shards (853-881)
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py

4-4: datetime imported but unused

Remove unused import: datetime

(F401)


10-10: unittest.result imported but unused

Remove unused import: unittest.result

(F401)


853-853: Line too long (146 > 120)

(E501)


864-864: Line too long (173 > 120)

(E501)


868-868: Line too long (135 > 120)

(E501)


874-874: Line too long (126 > 120)

(E501)


876-876: Line too long (122 > 120)

(E501)


968-968: Line too long (137 > 120)

(E501)

cratedb_toolkit/admin/xmover/cli.py

67-67: Line too long (188 > 120)

(E501)


68-68: Line too long (192 > 120)

(E501)


70-70: Line too long (140 > 120)

(E501)

🪛 GitHub Actions: Tests: Common
cratedb_toolkit/admin/xmover/analysis/shard.py

[error] 853-853: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when evaluating annotation 'str | None' in ShardMonitor.monitor_shards. Python 3.9 does not support the PEP 604 union operator. This occurred during 'pytest -m cfr' collection. Upgrade to Python 3.10+ or replace with 'Optional[str]' / 'Union[str, None]'.

cratedb_toolkit/admin/xmover/cli.py

[error] 1-1: Ruff format check detected formatting issues. This file would be reformatted by Ruff. Step: Poe check (lint[0]); exit code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build-and-publish
  • GitHub Check: build-and-publish
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (1)

936-941: Verify schema_name on ShardInfo before updating compound ID
Including the schema in the shard‐ID string is a sound approach to prevent collisions when the same table name exists in multiple schemas. Before applying the refactor, please confirm:

• The ShardInfo type actually exposes a schema_name attribute.
• All callers of _get_shard_compound_id (e.g., map keys, logging, UI components) can handle the new format.

If schema_name is available and you update the method, your change might look like:

-    def _get_shard_compound_id(self, shard: ShardInfo) -> str:
-        if self.sort_by == 'node':
-            return f"{shard.node_name}-{shard.table_name}-{shard.shard_id}"
-        else:
-            return f"{shard.table_name}-{shard.shard_id}-{shard.node_name}"
+    def _get_shard_compound_id(self, shard: ShardInfo) -> str:
+        # include schema to avoid collisions across schemas
+        if self.sort_by == "node":
+            return f"{shard.node_name}-{shard.schema_name}.{shard.table_name}-{shard.shard_id}"
+        return f"{shard.schema_name}.{shard.table_name}-{shard.shard_id}-{shard.node_name}"

Please verify the presence of schema_name and update downstream usage as needed.

@@ -1,16 +1,19 @@
"""
Shard analysis and rebalancing logic for CrateDB
"""

import datetime
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import to satisfy Ruff (F401)

datetime is unused in this module version.

-import datetime
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import datetime
🧰 Tools
🪛 Ruff (0.12.2)

4-4: datetime imported but unused

Remove unused import: datetime

(F401)

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 4, the module
currently imports datetime but does not use it; remove the unused import line
"import datetime" to satisfy the Ruff F401 lint rule and keep imports minimal.

from typing import Any, Dict, List, Optional, Set, Tuple, Union
from unittest import result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused unittest.result import (Ruff F401)

-from unittest import result
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from unittest import result
🧰 Tools
🪛 Ruff (0.12.2)

10-10: unittest.result imported but unused

Remove unused import: unittest.result

(F401)

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 10, the import
"from unittest import result" is unused and flagged by Ruff F401; remove that
import line (or replace it with the correct needed import if the symbol was
intended to be used) so the file no longer imports an unused name.

Comment on lines 850 to 853
self.table_filter: str|None = None
self.sort_by: str = 'heat'

def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Python 3.9 compatibility: replace PEP 604 unions with Optional[...]

Pipeline fails at runtime when evaluating str | None. Use Optional[str] to support Python 3.9.

-        self.table_filter: str|None = None
+        from typing import Optional  # if not already imported at top
+        self.table_filter: Optional[str] = None

And update the method signature:

-def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'):
+def monitor_shards(
+    self,
+    table_filter: Optional[str],
+    interval_in_seconds: int = 5,
+    repeat: int = 10,
+    n_shards: int = 40,
+    sort_by: str = "heat",
+    watch: bool = False,  # optional if implementing CLI watch flag
+):

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.12.2)

853-853: Line too long (146 > 120)

(E501)

🪛 GitHub Actions: Tests: Common

[error] 853-853: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when evaluating annotation 'str | None' in ShardMonitor.monitor_shards. Python 3.9 does not support the PEP 604 union operator. This occurred during 'pytest -m cfr' collection. Upgrade to Python 3.10+ or replace with 'Optional[str]' / 'Union[str, None]'.

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 850 to 853, the
code uses PEP 604 union types (str | None) which break on Python 3.9; change all
occurrences to typing.Optional[str] (e.g., self.table_filter: Optional[str] =
None and def monitor_shards(self, table_filter: Optional[str], ...)), and ensure
you add/import "from typing import Optional" at the top of the file if not
already present.

Comment on lines 864 to 869
with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards:
while True:
sleep(interval_in_seconds)
self.refresh_data()
live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas))
else:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Wrap long Rich Live line(s) to pass Ruff E501 and optionally drive watch

Two lines exceed 120 chars. Also, go_live is hardcoded; consider using a watch parameter from the CLI (optional).

-        go_live = False
+        go_live = watch  # drive from CLI; default False
         if go_live:
-            with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards:
+            with Live(
+                self.generate_shards_table(
+                    self._get_top_shards(self.latest_shards, n_shards),
+                    self.seq_deltas,
+                ),
+                refresh_per_second=4,
+                console=console,
+            ) as live_shards:
                 while True:
                     sleep(interval_in_seconds)
                     self.refresh_data()
-                    live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas))
+                    live_shards.update(
+                        self.generate_shards_table(
+                            self._get_top_shards(self.latest_shards, n_shards),
+                            self.seq_deltas,
+                        )
+                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards:
while True:
sleep(interval_in_seconds)
self.refresh_data()
live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas))
else:
go_live = watch # drive from CLI; default False
if go_live:
with Live(
self.generate_shards_table(
self._get_top_shards(self.latest_shards, n_shards),
self.seq_deltas,
),
refresh_per_second=4,
console=console,
) as live_shards:
while True:
sleep(interval_in_seconds)
self.refresh_data()
live_shards.update(
self.generate_shards_table(
self._get_top_shards(self.latest_shards, n_shards),
self.seq_deltas,
)
)
else:
# existing non-live rendering path
🧰 Tools
🪛 Ruff (0.12.2)

864-864: Line too long (173 > 120)

(E501)


868-868: Line too long (135 > 120)

(E501)

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 864 to 869, two
lines calling Live(..., refresh_per_second=4, console=console) and
live_shards.update(...) exceed 120 characters; wrap the long Rich Live
invocation and the update call to multiple shorter concatenated/indented lines
to satisfy Ruff E501 (<=120 chars). Also replace the hardcoded go_live/watch
behavior by wiring a CLI parameter (e.g., watch or go_live) into this branch and
use that parameter instead of a literal so the visibility loop is controlled
externally.

str(self.reference_shards[shard_compound_id].seq_stats_max_seq_no),
str(shard.seq_stats_max_seq_no)
)
console.print(shards_table)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid double-printing table (UI prints twice per iteration)

display_shards_table_rows prints the table, and then the caller prints it again. Remove the internal print to avoid duplicate output.

-        console.print(shards_table)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.print(shards_table)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 934, the function
display_shards_table_rows is printing the shards_table internally and the caller
also prints it, causing duplicate UI output; remove the internal
console.print(shards_table) call (or change the function to only return the
table and let the caller handle printing) so the table is printed exactly once
by the caller.

Comment on lines 954 to 970
refreshed_number = shard.seq_stats_max_seq_no
reference = reference_shards[shard_compound_id].seq_stats_max_seq_no

if refreshed_number < reference:
refreshed_number += 2 ** 63 - 1

seq_result[shard_compound_id] = refreshed_number - reference
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Defensive handling of None seq-no and rollover magic constant

  • Coalesce seq_stats_max_seq_no to 0 to avoid TypeError when the DB returns NULLs (e.g., non-STARTED shards).
  • The rollover adjustment uses 2 ** 63 - 1. Please confirm this matches CrateDB’s seq_no range; otherwise compute delta against a modulo base or reset reference.
-                refreshed_number = shard.seq_stats_max_seq_no
-                reference = reference_shards[shard_compound_id].seq_stats_max_seq_no
+                refreshed_number = int(shard.seq_stats_max_seq_no or 0)
+                reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0)
 
-                if refreshed_number < reference:
-                    refreshed_number += 2 ** 63 - 1
+                # Handle rollover: if sequence decreased, assume wraparound.
+                # TODO: confirm expected max range for seq_no in CrateDB.
+                if refreshed_number < reference:
+                    refreshed_number += (2**63 - 1)

🏁 Script executed:

#!/bin/bash
# Locate references to seq_stats_max_seq_no and show context
rg -nH -C3 'seq_stats_max_seq_no' .

Length of output: 2745


Improve seq-no handling: coalesce NULLs → 0 & correct wrap-around delta logic

To avoid runtime errors when seq_stats_max_seq_no is NULL (e.g. non-STARTED shards) and ensure the rollover fixup uses the full 64-bit modulus, please apply the following changes:

• In cratedb_toolkit/admin/xmover/util/database.py (row[12]) and the corresponding model, coalesce the DB value to 0 so downstream code never sees None and TypeErrors from comparisons or arithmetic.
• In cratedb_toolkit/admin/xmover/analysis/shard.py (lines 954–961), update the seq-no math to:

  • Cast to int with or 0 defaults
  • On wrap-around (current < reference), add 1 << 64 (the full unsigned 64-bit modulus), or use modular arithmetic ((refreshed - reference) % (1 << 64)) to get the correct delta.

Proposed diff:

-                refreshed_number = shard.seq_stats_max_seq_no
-                reference = reference_shards[shard_compound_id].seq_stats_max_seq_no
+                refreshed_number = int(shard.seq_stats_max_seq_no or 0)
+                reference        = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0)

-                if refreshed_number < reference:
-                    refreshed_number += 2 ** 63 - 1
+                if refreshed_number < reference:
+                    # Handle signed-long overflow: wrap around using full 64-bit modulus
+                    refreshed_number += 1 << 64

-                seq_result[shard_compound_id] = refreshed_number - reference
+                # Or simply: seq_result[shard_compound_id] = (refreshed_number - reference) % (1 << 64)
+                seq_result[shard_compound_id] = refreshed_number - reference
                 size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
refreshed_number = shard.seq_stats_max_seq_no
reference = reference_shards[shard_compound_id].seq_stats_max_seq_no
if refreshed_number < reference:
refreshed_number += 2 ** 63 - 1
seq_result[shard_compound_id] = refreshed_number - reference
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb
refreshed_number = int(shard.seq_stats_max_seq_no or 0)
reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0)
if refreshed_number < reference:
# Handle signed-long overflow: wrap around using full 64-bit modulus
refreshed_number += 1 << 64
# Or simply: seq_result[shard_compound_id] = (refreshed_number - reference) % (1 << 64)
seq_result[shard_compound_id] = refreshed_number - reference
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 954–961 and
cratedb_toolkit/admin/xmover/util/database.py (row[12]) the code can encounter
NULL seq_stats_max_seq_no and uses an incorrect wrap-around fixup; coalesce DB
seq_no to 0 in database.py (and update the model) so callers never see None, and
in shard.py cast seq values to int with a default of 0 and compute the delta
using full 64-bit modular arithmetic (e.g. (refreshed - reference) % (1 << 64))
instead of adding 2**63 - 1 when refreshed < reference so wrap-around is handled
correctly.

Comment on lines 977 to 992
def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
if n_shards < 1:
return sorted_shards[:n_shards]
else:
return sorted_shards
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Top-N bug: slices applied only when n_shards < 1

This logic shows all shards for positive N and slices only for negative values. Invert the condition so positive n_shards returns the top-N.
[uggest_essential_refactor]

-    def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
-        if n_shards < 1:
-            return sorted_shards[:n_shards]
-        else:
-            return sorted_shards
+    def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
+        if n_shards > 0:
+            return sorted_shards[:n_shards]
+        return sorted_shards
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
if n_shards < 1:
return sorted_shards[:n_shards]
else:
return sorted_shards
def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
if n_shards > 0:
return sorted_shards[:n_shards]
return sorted_shards
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 977 to 981, the
condition is inverted so slices are applied only when n_shards is negative;
change the logic to apply slicing when n_shards is positive: if n_shards > 0
return sorted_shards[:n_shards], else return sorted_shards (or return an empty
list if you want n_shards == 0 to yield no shards).

Comment on lines 65 to 92
@main.command()
@click.option("--table", "-t", default=None, help="Analyze specific table only")
@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)")
@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)")
@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)")
@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)")
@click.pass_context
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix Ruff E501: wrap long Click option lines and validate --sort-by choices

The three @click.option lines exceed the 120-char limit and break the pipeline. Also, constrain --sort-by to the supported values to catch typos early.

Apply this diff to wrap and validate:

-@main.command()
-@click.option("--table", "-t", default=None, help="Analyze specific table only")
-@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)")
-@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)")
-@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)")
-@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)")
+@main.command(name="monitor-shards")
+@click.option("--table", "-t", default=None, help="Analyze specific table only")
+@click.option(
+    "--wait-time",
+    "-w",
+    default=10,
+    help=(
+        "Seconds to wait between checks. "
+        "Longer waits yield more accurate deltas (default: 10)."
+    ),
+)
+@click.option(
+    "--repeat",
+    "-r",
+    default=1,
+    help=(
+        "Number of iterations to run. Use -1 for continuous monitoring "
+        "(default: 1)."
+    ),
+)
+@click.option(
+    "--max-results",
+    "-m",
+    default=40,
+    help="Maximum number of shards to display (default: 40)",
+)
+@click.option(
+    "--sort-by",
+    "-s",
+    type=click.Choice(["heat", "node", "table"]),
+    default="heat",
+    help="Sort by heat, node, or table (default: heat)",
+)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@main.command()
@click.option("--table", "-t", default=None, help="Analyze specific table only")
@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)")
@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)")
@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)")
@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)")
@click.pass_context
@main.command(name="monitor-shards")
@click.option("--table", "-t", default=None, help="Analyze specific table only")
@click.option(
"--wait-time",
"-w",
default=10,
help=(
"Seconds to wait between checks. "
"Longer waits yield more accurate deltas (default: 10)."
),
)
@click.option(
"--repeat",
"-r",
default=1,
help=(
"Number of iterations to run. Use -1 for continuous monitoring "
"(default: 1)."
),
)
@click.option(
"--max-results",
"-m",
default=40,
help="Maximum number of shards to display (default: 40)",
)
@click.option(
"--sort-by",
"-s",
type=click.Choice(["heat", "node", "table"]),
default="heat",
help="Sort by heat, node, or table (default: heat)",
)
@click.pass_context
🧰 Tools
🪛 Ruff (0.12.2)

67-67: Line too long (188 > 120)

(E501)


68-68: Line too long (192 > 120)

(E501)


70-70: Line too long (140 > 120)

(E501)

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/cli.py around lines 65 to 71, several
@click.option decorators exceed the 120-char limit and the --sort-by option is
not constrained; wrap the long option lines to keep them under 120 characters
(split arguments across lines so each line is <=120 chars) and change the
--sort-by option to use click.Choice(["heat","node","table"]) (with an
appropriate default of "heat") so invalid values are rejected early; keep other
option defaults and help text the same.

Comment on lines 150 to 155
s.routing_state,
s.seq_no_stats['max_seq_no'],
s.seq_no_stats['global_checkpoint'],
s.seq_no_stats['local_checkpoint']
FROM sys.shards s
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Seq-no stats can be NULL for non-STARTED shards; coalesce at the source

When for_analysis=True, non-STARTED shards may return NULLs for seq_no_stats. Downstream code does arithmetic on these fields and will crash. Coalesce to 0 in SQL to guarantee numeric values.

-            s.routing_state,
-            s.seq_no_stats['max_seq_no'],
-            s.seq_no_stats['global_checkpoint'],
-            s.seq_no_stats['local_checkpoint']
+            s.routing_state,
+            COALESCE(s.seq_no_stats['max_seq_no'], 0),
+            COALESCE(s.seq_no_stats['global_checkpoint'], 0),
+            COALESCE(s.seq_no_stats['local_checkpoint'], 0)

🏁 Script executed:

#!/bin/bash
set -eux

# Locate the target file
file=$(find . -type f -path "*cratedb_toolkit/admin/xmover/util/database.py" -print -quit)
echo "Inspecting $file"

# Display context around the SQL snippet
sed -n '130,200p' "$file"

# Find where for_analysis parameter is referenced in this file
grep -n "for_analysis" "$file"

# Search the entire repo for usage of max_seq_no to locate downstream arithmetic
grep -R "max_seq_no" -n .

Length of output: 4061


🏁 Script executed:

#!/bin/bash
set -eux

# Display the function definition containing the seq_no_stats query
file="./cratedb_toolkit/admin/xmover/util/database.py"
echo "===== Part of database.py (lines 1-140) ====="
sed -n '1,140p' "$file"

# Display the ShardInfo model to see how seq_stats_* fields are defined
echo "===== Part of model.py (lines 1-200) ====="
sed -n '1,200p' cratedb_toolkit/admin/xmover/model.py

Length of output: 9969


Coalesce seq_no_stats at the source to avoid NULLs for non-STARTED shards

When for_analysis=True, the SQL will include shards in states other than STARTED, and s.seq_no_stats[...] can be NULL. Since downstream code assigns these directly to integer fields and performs arithmetic, those NULLs will cause runtime errors. Coalesce to 0 in the SELECT to guarantee non-null numeric values.

• File: cratedb_toolkit/admin/xmover/util/database.py
Lines: 146–150 (in the get_shards_info query)

Suggested diff:

         s.routing_state,
-        s.seq_no_stats['max_seq_no'],
-        s.seq_no_stats['global_checkpoint'],
-        s.seq_no_stats['local_checkpoint']
+        COALESCE(s.seq_no_stats['max_seq_no'], 0)         AS max_seq_no,
+        COALESCE(s.seq_no_stats['global_checkpoint'], 0)  AS global_checkpoint,
+        COALESCE(s.seq_no_stats['local_checkpoint'], 0)   AS local_checkpoint
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s.routing_state,
s.seq_no_stats['max_seq_no'],
s.seq_no_stats['global_checkpoint'],
s.seq_no_stats['local_checkpoint']
FROM sys.shards s
s.routing_state,
COALESCE(s.seq_no_stats['max_seq_no'], 0) AS max_seq_no,
COALESCE(s.seq_no_stats['global_checkpoint'], 0) AS global_checkpoint,
COALESCE(s.seq_no_stats['local_checkpoint'], 0) AS local_checkpoint
FROM sys.shards s
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/util/database.py around lines 146–150, the
SELECT uses s.seq_no_stats['max_seq_no'], s.seq_no_stats['global_checkpoint'],
and s.seq_no_stats['local_checkpoint'] directly which can be NULL when
for_analysis=True (non-STARTED shards); update the query to COALESCE each of
those expressions to 0 in the SELECT so the returned values are guaranteed
non-null integers and downstream arithmetic/assignments won’t error.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (8)
cratedb_toolkit/admin/xmover/analysis/shard.py (8)

4-4: Remove unused import to fix Ruff F401

The datetime import is unused and flagged by static analysis.


8-8: Remove unused import to fix Ruff F401

The sleep import from time is unused and flagged by static analysis.


10-10: Remove unused import to fix Ruff F401

The unittest.result import is unused and flagged by static analysis.


922-922: Remove duplicate table printing

The function prints the table internally and the caller also prints it, causing duplicate output.


942-949: Fix potential None handling and rollover arithmetic

The code doesn't handle potential None values for seq_stats_max_seq_no and uses incorrect rollover arithmetic.


965-969: Fix inverted logic for top-N shard filtering

The condition is inverted - it shows all shards for positive N and slices only for negative values.


853-853: Fix Python 3.9 compatibility and line length

The method signature uses PEP 604 union syntax and exceeds line length limits.

Apply this diff to fix both issues:

-    def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'):
+    def monitor_shards(
+        self,
+        table_filter: Optional[str],
+        interval_in_seconds: int = 5,
+        repeat: int = 10,
+        n_shards: int = 40,
+        sort_by: str = 'heat',
+    ):

862-874: Implement proper watch functionality and fix infinite loop

The current implementation has several issues:

  • Infinite loop without proper exit condition for non-watch mode
  • Missing watch parameter functionality
  • Line length violations

Apply this diff to fix the monitoring logic:

         iterations = 0
+        go_live = False  # TODO: wire from CLI watch parameter
         while True:
             sleep(interval_in_seconds)
             self.refresh_data()
-            shards_table = self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas)
+            shards_table = self.generate_shards_table(
+                self._get_top_shards(self.latest_shards, n_shards), 
+                self.seq_deltas
+            )
             console.print(shards_table)
-            nodes_table = self.generate_nodes_table(self._get_nodes_heat_info(self.reference_shards, self.seq_deltas))
+            nodes_table = self.generate_nodes_table(
+                self._get_nodes_heat_info(self.reference_shards, self.seq_deltas)
+            )
             console.print(nodes_table)
 
             iterations += 1
             if 0 < repeat <= iterations:
                 break
🧹 Nitpick comments (2)
cratedb_toolkit/admin/xmover/analysis/shard.py (2)

956-956: Fix line length violation

The line exceeds the 120-character limit.

Apply this diff to wrap the long line:

-        updated_shards: list[ShardInfo] = [s for s in self.analyzer.shards if not self.table_filter or self.table_filter == s.table_name]
+        updated_shards: list[ShardInfo] = [
+            s for s in self.analyzer.shards 
+            if not self.table_filter or self.table_filter == s.table_name
+        ]

958-962: Fix line length violations in sorting logic

Multiple lines exceed the 120-character limit.

Apply this diff to wrap the long lines:

         if self.sort_by == 'heat':
-            self.latest_shards = sorted(updated_shards, key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)],
-                                        reverse=True)
+            self.latest_shards = sorted(
+                updated_shards, 
+                key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)],
+                reverse=True
+            )
         else:
-            self.latest_shards = sorted(updated_shards, key=lambda s: self._get_shard_compound_id(s))
+            self.latest_shards = sorted(
+                updated_shards, 
+                key=lambda s: self._get_shard_compound_id(s)
+            )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2b2cfb6 and 8b2ce40.

📒 Files selected for processing (1)
  • cratedb_toolkit/admin/xmover/analysis/shard.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
cratedb_toolkit/admin/xmover/model.py (2)
  • ShardInfo (33-54)
  • size_gb (84-86)
cratedb_toolkit/admin/xmover/cli.py (1)
  • monitor_shards (72-77)
cratedb_toolkit/admin/xmover/util/format.py (1)
  • format_size (1-8)
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py

4-4: datetime imported but unused

Remove unused import: datetime

(F401)


10-10: unittest.result imported but unused

Remove unused import: unittest.result

(F401)


16-16: rich.live.Live imported but unused

Remove unused import: rich.live.Live

(F401)


853-853: Line too long (146 > 120)

(E501)


866-866: Line too long (122 > 120)

(E501)


956-956: Line too long (137 > 120)

(E501)

🪛 GitHub Actions: Tests: Common
cratedb_toolkit/admin/xmover/analysis/shard.py

[error] 853-853: pytest -m pymongo failed during collection: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in ShardMonitor.monitor_shards signature. This uses Python 3.10+'s PEP 604 union syntax ('str | None'), not supported on Python 3.9. Replace with Optional[str] or upgrade Python.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-and-publish
  • GitHub Check: build-and-publish


from rich import box
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from rich.live import Live
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import to fix Ruff F401

The Live import from rich.live is unused and flagged by static analysis.

-from rich.live import Live
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from rich.live import Live
-from rich.live import Live
🧰 Tools
🪛 Ruff (0.12.2)

16-16: rich.live.Live imported but unused

Remove unused import: rich.live.Live

(F401)

🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 16 the import
"from rich.live import Live" is unused and triggers Ruff F401; remove that
import statement (or, if Live is intended to be used later, use it where
required) so the unused symbol is gone and re-run linters to confirm the warning
is resolved.

Comment on lines 842 to 860
class ShardMonitor:
def __init__(self, analyzer: ShardAnalyzer):
self.analyzer = analyzer
self.reference_shards: dict[str, ShardInfo]
self.latest_shards: list[ShardInfo]
self.seq_deltas: dict[str, int]
self.size_deltas: dict[str, float]

self.table_filter: str|None = None
self.sort_by: str = 'heat'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix Python 3.9 compatibility and improve type annotations

The code uses PEP 604 union syntax (str | None) which breaks on Python 3.9 and has missing type annotations for instance variables.

Apply this diff to fix compatibility and type annotations:

+from typing import Optional
+
 class ShardMonitor:
     def __init__(self, analyzer: ShardAnalyzer):
         self.analyzer = analyzer
-        self.reference_shards: dict[str, ShardInfo]
-        self.latest_shards: list[ShardInfo]
-        self.seq_deltas: dict[str, int]
-        self.size_deltas: dict[str, float]
+        self.reference_shards: dict[str, ShardInfo] = {}
+        self.latest_shards: list[ShardInfo] = []
+        self.seq_deltas: dict[str, int] = {}
+        self.size_deltas: dict[str, float] = {}
 
-        self.table_filter: str|None = None
+        self.table_filter: Optional[str] = None
         self.sort_by: str = 'heat'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class ShardMonitor:
def __init__(self, analyzer: ShardAnalyzer):
self.analyzer = analyzer
self.reference_shards: dict[str, ShardInfo]
self.latest_shards: list[ShardInfo]
self.seq_deltas: dict[str, int]
self.size_deltas: dict[str, float]
self.table_filter: str|None = None
self.sort_by: str = 'heat'
from typing import Optional
class ShardMonitor:
def __init__(self, analyzer: ShardAnalyzer):
self.analyzer = analyzer
self.reference_shards: dict[str, ShardInfo] = {}
self.latest_shards: list[ShardInfo] = []
self.seq_deltas: dict[str, int] = {}
self.size_deltas: dict[str, float] = {}
self.table_filter: Optional[str] = None
self.sort_by: str = 'heat'
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 842-852, the
class uses PEP 604 union syntax and bare built-in generics which break on Python
3.9 and leaves instance attributes uninitialized/unannotated; change type hints
to use typing imports (from typing import Dict, List, Optional) and
annotate/initialize the attributes, e.g. reference_shards: Dict[str, ShardInfo]
= {} , latest_shards: List[ShardInfo] = [], seq_deltas: Dict[str, int] = {},
size_deltas: Dict[str, float] = {}, and table_filter: Optional[str] = None,
keeping sort_by: str = 'heat'. Ensure typing imports are added at top of file.

@juanpardo juanpardo force-pushed the jp/xmover-shard-hot-monitor branch from 2d98ae3 to 92e1d81 Compare September 10, 2025 15:31
@amotl amotl force-pushed the xmover branch 2 times, most recently from dac2a6d to cb05b4d Compare September 10, 2025 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants