-
Notifications
You must be signed in to change notification settings - Fork 4
Admin/XMover: Shard heat monitoring, optionally watching #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: xmover
Are you sure you want to change the base?
Conversation
- More or less just line-length fixes. - Only a single type adjustment was needed on the return value of the `analyze_distribution` method. - Ruff recommended to use set comprehensions, so here we go. - At a single spot where an exception has been `pass`ed, we added error output. Is it bad?
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a ShardMonitor class and CLI subcommand to periodically monitor hot shards (computing seq/size deltas and rendering tables), extends ShardInfo with three seq_no_stats fields, updates DB queries to return those fields, and introduces duplicate ShardMonitor declarations. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as xmover CLI
participant Analyzer as ShardAnalyzer
participant DB as CrateDB
participant Monitor as ShardMonitor
participant UI as Rich Console
User->>CLI: run "xmover monitor_shards"
CLI->>Analyzer: construct(client)
CLI->>Monitor: construct(analyzer)
loop polling
Monitor->>Analyzer: refresh shards
Analyzer->>DB: SELECT ... (+ seq_no_stats columns)
DB-->>Analyzer: rows
Analyzer-->>Monitor: list[ShardInfo]
Monitor->>Monitor: compute seq/size deltas, update reference map
Monitor->>UI: render shards table and nodes heat panel
Monitor->>Monitor: sleep / repeat
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (1)
1-15
: Remove commented-out code to satisfy ERA001 lint checksCI linting shows remaining ERA001 errors in
cratedb_toolkit/admin/xmover/analysis/shard.py
. Please delete all commented-outshards_table.add_column(...)
and related lines, then re-run the formatter and linter:• File:
cratedb_toolkit/admin/xmover/analysis/shard.py
– Remove lines 871–873:
python # shards_table.add_column("Global Checkpoint", style="magenta") # shards_table.add_column("Local Checkpoint", style="magenta") # shards_table.add_column("Hot Timestamp", style="magenta")
– Remove lines 898–900:
python # str(shard.seq_stats_global_checkpoint), # str(shard.seq_stats_local_checkpoint), # str(hot_timestamp) if shard.hot_timestamp else "-",
After cleanup, run:
ruff format cratedb_toolkit/admin/xmover/analysis/shard.py ruff check --fix cratedb_toolkit/admin/xmover/analysis/shard.pyto confirm no lint errors remain.
🧹 Nitpick comments (1)
cratedb_toolkit/admin/xmover/cli.py (1)
65-73
: Add watch/interval flags to align with PR objective (optional)To deliver the “--watch-like real-time updating view,” consider adding --watch and --refresh-interval options (mirroring monitor_recovery) and pass them to ShardMonitor.monitor_shards(watch, refresh_interval). Otherwise, keep this as a one-shot snapshot and document accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
cratedb_toolkit/admin/xmover/analysis/shard.py
(2 hunks)cratedb_toolkit/admin/xmover/cli.py
(2 hunks)cratedb_toolkit/admin/xmover/model.py
(1 hunks)cratedb_toolkit/admin/xmover/util/database.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py
8-8: time.sleep
imported but unused
Remove unused import: time.sleep
(F401)
871-871: Found commented-out code
Remove commented-out code
(ERA001)
872-872: Found commented-out code
Remove commented-out code
(ERA001)
873-873: Found commented-out code
Remove commented-out code
(ERA001)
898-898: Found commented-out code
Remove commented-out code
(ERA001)
899-899: Found commented-out code
Remove commented-out code
(ERA001)
900-900: Found commented-out code
Remove commented-out code
(ERA001)
🪛 GitHub Actions: Tests: Common
cratedb_toolkit/admin/xmover/analysis/shard.py
[error] 1-1: ruff format --check would reformat cratedb_toolkit/admin/xmover/analysis/shard.py. 1 file would be reformatted. Lint step 'lint[0]' aborted with exit code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-and-publish
- GitHub Check: build-and-publish
🔇 Additional comments (4)
cratedb_toolkit/admin/xmover/model.py (1)
32-51
: AllShardInfo
instantiations have been updated—no further action required
- Verified a single instantiation in
cratedb_toolkit/admin/xmover/util/database.py
(lines 166–183)
– All 16 parameters of the updatedShardInfo
dataclass are passed by name and in the correct order.cratedb_toolkit/admin/xmover/cli.py (1)
13-13
: Import looks correct and aligns with new feature wiringcratedb_toolkit/admin/xmover/util/database.py (2)
179-183
: ShardInfo mapping expects Optional[int] after model changeOnce ShardInfo types are Optional[int], these assignments are fine even if the query returns NULL. If you prefer strict ints, add COALESCE in SQL or convert here.
150-155
: Refactorhot_timestamp
extraction to avoid unsupported array lambdasCrateDB does not support higher‐order functions like
transform(array, lambda)
norarray_max
on arrays of objects—it only providesarray_max(array)
for primitive arrays (cratedb.com).Instead, guard against out-of-bounds access with a length check (for example, using
cardinality
) and fall back toNULL
when the second lease isn’t present. You can replace the direct index access:- s.routing_state, - s.seq_no_stats['max_seq_no'], - s.seq_no_stats['global_checkpoint'], - s.seq_no_stats['local_checkpoint'], - s.retention_leases['leases'][1]['timestamp'] + s.routing_state, + s.seq_no_stats['max_seq_no'] AS max_seq_no, + s.seq_no_stats['global_checkpoint'] AS global_checkpoint, + s.seq_no_stats['local_checkpoint'] AS local_checkpoint, + CASE + WHEN cardinality(s.retention_leases['leases']) > 1 + THEN s.retention_leases['leases'][1]['timestamp'] + ELSE NULL + END AS hot_timestampThis avoids runtime errors when
leases
has fewer than two elements (and still returns the second-element timestamp when available).If you truly need the latest lease timestamp across all objects, you must either:
- Flatten and aggregate in your application layer (since CrateDB lacks array lambdas), or
- Define a custom UDF (e.g., in JavaScript) to compute the max
timestamp
on the server side.Likely an incorrect or invalid review comment.
def __init__(self, analyzer: ShardAnalyzer): | ||
self.analyzer = analyzer | ||
self.shard_time_deltas: dict[str, int] = {} | ||
self.shard_seq_deltas: dict[str, int] = {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Store computed deltas and avoid key collisions across tables
The dictionaries are typed for int values, but hot deltas can be absent. Also, keying by node_id-shard_id risks collisions across tables. Make time deltas Optional[int] and use a compound key including schema+table.
Apply this diff:
class ShardMonitor:
def __init__(self, analyzer: ShardAnalyzer):
self.analyzer = analyzer
- self.shard_time_deltas: dict[str, int] = {}
- self.shard_seq_deltas: dict[str, int] = {}
+ self.shard_time_deltas: dict[str, Optional[int]] = {}
+ self.shard_seq_deltas: dict[str, int] = {}
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 841 to 845, the
two dicts currently typed as dict[str, int] and keyed only by node_id-shard_id
can lose absent (hot) deltas and collide across different tables; change their
type annotations to dict[str, Optional[int]] (or dict[tuple[str,str,str],
Optional[int]] if you prefer tuple keys) to allow None for missing deltas, and
change the keying to include schema and table (e.g. compound key using
schema.table plus node_id and shard_id or a tuple of (schema, table, node_id,
shard_id)) so entries are unique across tables; update all places that
read/write these dicts to handle Optional[int] (check for None) and to
construct/parse the new compound key consistently.
seq_stats_max_seq_no: int | ||
seq_stats_global_checkpoint: int | ||
seq_stats_local_checkpoint: int | ||
hot_timestamp: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make new ShardInfo fields Optional with defaults to avoid None-related crashes and ease compatibility
seq_no_stats and retention_leases can be missing/null on some shards. With strict int typing and no defaults, construction will break and downstream code (e.g., datetime.fromtimestamp) can crash. Recommend Optional[int] with default None.
Apply this diff:
- seq_stats_max_seq_no: int
- seq_stats_global_checkpoint: int
- seq_stats_local_checkpoint: int
- hot_timestamp: int
+ seq_stats_max_seq_no: Optional[int] = None
+ seq_stats_global_checkpoint: Optional[int] = None
+ seq_stats_local_checkpoint: Optional[int] = None
+ hot_timestamp: Optional[int] = None
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
seq_stats_max_seq_no: int | |
seq_stats_global_checkpoint: int | |
seq_stats_local_checkpoint: int | |
hot_timestamp: int | |
seq_stats_max_seq_no: Optional[int] = None | |
seq_stats_global_checkpoint: Optional[int] = None | |
seq_stats_local_checkpoint: Optional[int] = None | |
hot_timestamp: Optional[int] = None |
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/model.py around lines 48 to 51, the integer
fields seq_stats_max_seq_no, seq_stats_global_checkpoint,
seq_stats_local_checkpoint, and hot_timestamp must be made optional with
defaults to avoid construction crashes when values are missing/null; change
their type annotations to Optional[int] and give them a default of None, and
ensure Optional is imported from typing (or typing import Optional is added at
top); likewise, review nearby shard fields like seq_no_stats and
retention_leases and apply Optional with default None where they can be absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (6)
cratedb_toolkit/admin/xmover/analysis/shard.py (6)
884-890
: Make interval configurable and refresh both snapshotsHardcoding 10 seconds and not refreshing the baseline snapshot may surprise users. Expose interval and always refresh both snapshots to get a consistent pair.
Apply this diff:
- def monitor_shards(self): - max_seconds_old = 300 - - self.refresh_deltas(refresh_data=False) - sleep(10) - self.refresh_deltas() + def monitor_shards(self, interval_seconds: int = 10, max_seconds_old: int = 300): + # Take two fresh snapshots separated by interval_seconds + self.refresh_deltas(refresh_data=True) + if interval_seconds > 0: + sleep(interval_seconds) + self.refresh_deltas(refresh_data=True)
891-893
: Sort by hottest (largest seq delta first) and wrap long lineCurrent sort is ascending and exceeds line length. Sort by delta desc to surface hot shards first and break lines.
Apply this diff:
- shards: list[ShardInfo] = self.filter_shards(max_seconds_old) - sorted_shards: list[ShardInfo] = sorted(shards, key=lambda s: self.shard_seq_deltas[self._get_shard_compound_id(s)]) + shards: list[ShardInfo] = self.filter_shards(max_seconds_old) + sorted_shards: list[ShardInfo] = sorted( + shards, + key=lambda s: self.shard_seq_deltas.get(self._get_shard_compound_id(s), 0), + reverse=True, + )
907-918
: Render human-friendly age and guard missing deltasShow "-" when age is unknown and add "s" suffix; guard seq delta lookup.
Apply this diff:
for shard in sorted_shards: shard_compound_id = self._get_shard_compound_id(shard) shards_table.add_row( shard.schema_name, shard.table_name, str(shard.shard_id), shard.node_name, str(shard.is_primary), format_size(shard.size_gb), - str(self.shard_seq_deltas[shard_compound_id]), - str(self.shard_time_deltas[shard_compound_id]), + str(self.shard_seq_deltas.get(shard_compound_id, 0)), + (f"{self.shard_time_deltas.get(shard_compound_id)}s" + if self.shard_time_deltas.get(shard_compound_id) is not None + else "-"), )
849-849
: Trim trailing whitespace on blank linesThese lines trigger W293 in Ruff. Remove stray spaces.
Also applies to: 922-922
863-865
: Align model typing for hot_timestamp with runtime usageThis code treats hot_timestamp as Optional[int], but ShardInfo currently annotates it as int (see cratedb_toolkit/admin/xmover/model.py:32-54). If None is possible from sys.shards, update the dataclass to Optional[int] and ensure util/database populates None accordingly.
Would you like me to patch model.py and util/database.py to reflect Optional[int] and add unit tests covering None timestamps?
Also applies to: 878-883
884-923
: Optional: add --watch loop using Rich Live for continuous modeGiven the PR goal, consider a watch loop (opt-in) that refreshes and re-renders at interval_seconds using rich.live.Live. This keeps discrete mode as default while enabling continuous monitoring via a flag wired in CLI.
If you want, I can draft the Live-based loop and a CLI flag (e.g., --watch/--interval).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
cratedb_toolkit/admin/xmover/analysis/shard.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
cratedb_toolkit/admin/xmover/model.py (2)
ShardInfo
(33-55)size_gb
(85-87)cratedb_toolkit/admin/xmover/cli.py (1)
monitor_shards
(67-72)cratedb_toolkit/admin/xmover/util/format.py (1)
format_size
(1-8)
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py
849-849: Blank line contains whitespace
Remove whitespace from blank line
(W293)
867-867: Test for membership should be not in
Convert to not in
(E713)
870-870: Line too long (128 > 120)
(E501)
874-874: Variable id
is shadowing a Python builtin
(A001)
892-892: Line too long (124 > 120)
(E501)
922-922: Blank line contains whitespace
Remove whitespace from blank line
(W293)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-publish
- GitHub Check: build-and-test
- GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
- GitHub Check: CFR for OS ubuntu-latest
- GitHub Check: CFR for OS macos-latest
- GitHub Check: CFR for OS windows-latest
- GitHub Check: CFR for OS macos-13
🔇 Additional comments (2)
cratedb_toolkit/admin/xmover/analysis/shard.py (2)
8-8
: Resolved: sleep import now used by monitor_shards()The prior lint warning about an unused import is addressed; sleep is used on Line 888. No action needed here.
840-921
: ShardMonitor definitions, CLI flags, and hot_timestamp hydration all verified
- Only one
ShardMonitor
class exists (in analysis/shard.py).- CLI exposes
--watch
/-w
and--refresh-interval
flags in cratedb_toolkit/admin/xmover/cli.py.hot_timestamp
is correctly read from the DB (row[15] in database.py), stored as an integer in the model, and converted from milliseconds to seconds in analysis/shard.py.No further action required.
def _get_shard_compound_id(self, shard: ShardInfo) -> str: | ||
return f"{shard.node_id}-{shard.shard_id}" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Compound shard key risks collisions across tables
Keying by node_id-shard_id can collide between tables sharing the same shard id on the same node. Include schema and table to guarantee uniqueness.
Apply this diff:
def _get_shard_compound_id(self, shard: ShardInfo) -> str:
- return f"{shard.node_id}-{shard.shard_id}"
+ # schema.table:shard_id@node_id
+ return f"{shard.schema_name}.{shard.table_name}:{shard.shard_id}@{shard.node_id}"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _get_shard_compound_id(self, shard: ShardInfo) -> str: | |
return f"{shard.node_id}-{shard.shard_id}" | |
def _get_shard_compound_id(self, shard: ShardInfo) -> str: | |
# schema.table:shard_id@node_id | |
return f"{shard.schema_name}.{shard.table_name}:{shard.shard_id}@{shard.node_id}" |
🧰 Tools
🪛 Ruff (0.12.2)
849-849: Blank line contains whitespace
Remove whitespace from blank line
(W293)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 847-849 the
compound key uses only node_id and shard_id which can collide for different
tables on the same node; change the function to include schema and table
identifiers (e.g. schema and table name) along with node_id and shard_id to
guarantee uniqueness, and defensively handle missing attributes (use available
schema/table fields or sensible fallbacks) so callers continue to work.
cratedb_toolkit/admin/xmover/cli.py
Outdated
@main.command() | ||
@click.pass_context | ||
def monitor_shards(ctx): | ||
"""Monitor shards, pointing out hot ones""" | ||
client = ctx.obj["client"] | ||
analyzer = ShardAnalyzer(client) | ||
monitor = ShardMonitor(analyzer) | ||
monitor.monitor_shards() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea about »naming things«: xmover monitor-shards
certainly fits the bill, but what do you think about xmover shard-heat
1 to better reflect its intentions?
Footnotes
-
... with an optional
--watch
option in the long run, if we can make the program also produce static one-shot reports in a future iteration? ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
♻️ Duplicate comments (1)
cratedb_toolkit/admin/xmover/cli.py (1)
65-65
: Command naming: considershard-heat
for intent clarity (alias or rename)Prior review suggested
shard-heat
as a better name. You can either rename or add an alias while keepingmonitor-shards
.Option A (rename; simplest):
-@main.command(name="monitor-shards") +@main.command(name="shard-heat")Option B (add alias while keeping current name):
@main.command(name="shard-heat") @click.option(... same options ...) def monitor_shards(...): ... + +# Register alias for backward-compat +main.add_command(monitor_shards, name="monitor-shards")
🧹 Nitpick comments (6)
cratedb_toolkit/admin/xmover/cli.py (1)
72-78
: Wire a--watch
UX to match PR goals (optional live mode)The PR mentions a watch-style mode; currently
repeat=-1
simulates continuous mode but Live rendering toggle is hardcoded elsewhere. Consider adding a--watch/--no-watch
flag and passing it through, so we can enable Rich Live when desired.Proposed minimal CLI change (ShardsMonitor will need a matching parameter):
-@click.pass_context -def monitor_shards(ctx, table, wait_time, repeat, max_results, sort_by): +@click.option("--watch/--no-watch", default=False, help="Live-updating view") +@click.pass_context +def monitor_shards(ctx, table, wait_time, repeat, max_results, sort_by, watch): """Monitor shards, pointing out hot ones""" client = ctx.obj["client"] analyzer = ShardAnalyzer(client) monitor = ShardMonitor(analyzer) - monitor.monitor_shards(table, wait_time, repeat, max_results, sort_by) + monitor.monitor_shards(table, wait_time, repeat, max_results, sort_by, watch=watch)If you want, I can follow up with the matching changes in
ShardMonitor.monitor_shards
.cratedb_toolkit/admin/xmover/util/database.py (1)
178-181
: Cast seq-no fields to int on ingestion to avoid Optional[int] surprisesMake types explicit and robust against None (post-COALESCE this is defensive, but keeps the Python side consistent).
- seq_stats_max_seq_no=row[12], - seq_stats_global_checkpoint=row[13], - seq_stats_local_checkpoint=row[14], + seq_stats_max_seq_no=int(row[12] or 0), + seq_stats_global_checkpoint=int(row[13] or 0), + seq_stats_local_checkpoint=int(row[14] or 0),cratedb_toolkit/admin/xmover/analysis/shard.py (4)
971-974
: Wrap long sort line (Ruff E501) and guard against missing keysMinor readability fix and added safety.
- self.latest_shards = sorted(updated_shards, key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)], - reverse=True) + self.latest_shards = sorted( + updated_shards, + key=lambda s: self.seq_deltas.get(self._get_shard_compound_id(s), 0), + reverse=True, + )
900-913
: Consider hiding DEBUG columns behind a flag (cleaner default UI)The two DEBUG columns are useful during development but noisy for normal use. Gate them behind a
debug
flag or remove from default header.Example approach:
- Add a
show_debug: bool = False
attribute toShardMonitor
.- Conditionally add the debug columns and values.
883-893
: Minor: Node heat sorting by name descending might be unexpectedUsing
reverse=True
sorts both heat and node name descending. Often, descending by heat and ascending by name is preferred for stable alphabetical tie-breaks.- sorted_items = sorted(heat_nodes_info.items(), key=lambda kv: (kv[1], kv[0]), reverse=True) + sorted_items = sorted( + heat_nodes_info.items(), + key=lambda kv: (-kv[1], kv[0]) + )
842-849
: State initialization is deferred; minimal guard noteAttributes like
seq_deltas
/size_deltas
are assigned viarefresh_data()
. This is fine since they’re used only after that call inmonitor_shards
, but be mindful if other entrypoints get introduced. Optional: initialize to empty dicts for robustness.- self.reference_shards: dict[str, ShardInfo] - self.latest_shards: list[ShardInfo] - self.seq_deltas: dict[str, int] - self.size_deltas: dict[str, float] + self.reference_shards: dict[str, ShardInfo] = {} + self.latest_shards: list[ShardInfo] = [] + self.seq_deltas: dict[str, int] = {} + self.size_deltas: dict[str, float] = {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
cratedb_toolkit/admin/xmover/analysis/shard.py
(2 hunks)cratedb_toolkit/admin/xmover/cli.py
(2 hunks)cratedb_toolkit/admin/xmover/model.py
(1 hunks)cratedb_toolkit/admin/xmover/util/database.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cratedb_toolkit/admin/xmover/model.py
🧰 Additional context used
🧬 Code graph analysis (2)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
cratedb_toolkit/admin/xmover/model.py (2)
ShardInfo
(33-54)size_gb
(84-86)cratedb_toolkit/admin/xmover/cli.py (1)
monitor_shards
(72-77)cratedb_toolkit/admin/xmover/util/format.py (1)
format_size
(1-8)
cratedb_toolkit/admin/xmover/cli.py (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
ShardAnalyzer
(33-839)ShardMonitor
(842-991)monitor_shards
(853-881)
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py
4-4: datetime
imported but unused
Remove unused import: datetime
(F401)
10-10: unittest.result
imported but unused
Remove unused import: unittest.result
(F401)
853-853: Line too long (146 > 120)
(E501)
864-864: Line too long (173 > 120)
(E501)
868-868: Line too long (135 > 120)
(E501)
874-874: Line too long (126 > 120)
(E501)
876-876: Line too long (122 > 120)
(E501)
968-968: Line too long (137 > 120)
(E501)
cratedb_toolkit/admin/xmover/cli.py
67-67: Line too long (188 > 120)
(E501)
68-68: Line too long (192 > 120)
(E501)
70-70: Line too long (140 > 120)
(E501)
🪛 GitHub Actions: Tests: Common
cratedb_toolkit/admin/xmover/analysis/shard.py
[error] 853-853: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when evaluating annotation 'str | None' in ShardMonitor.monitor_shards. Python 3.9 does not support the PEP 604 union operator. This occurred during 'pytest -m cfr' collection. Upgrade to Python 3.10+ or replace with 'Optional[str]' / 'Union[str, None]'.
cratedb_toolkit/admin/xmover/cli.py
[error] 1-1: Ruff format check detected formatting issues. This file would be reformatted by Ruff. Step: Poe check (lint[0]); exit code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build-and-publish
- GitHub Check: build-and-publish
- GitHub Check: CFR for OS macos-latest
- GitHub Check: CFR for OS ubuntu-latest
- GitHub Check: CFR for OS windows-latest
- GitHub Check: CFR for OS macos-13
🔇 Additional comments (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (1)
936-941
: Verifyschema_name
onShardInfo
before updating compound ID
Including the schema in the shard‐ID string is a sound approach to prevent collisions when the same table name exists in multiple schemas. Before applying the refactor, please confirm:• The
ShardInfo
type actually exposes aschema_name
attribute.
• All callers of_get_shard_compound_id
(e.g., map keys, logging, UI components) can handle the new format.If
schema_name
is available and you update the method, your change might look like:- def _get_shard_compound_id(self, shard: ShardInfo) -> str: - if self.sort_by == 'node': - return f"{shard.node_name}-{shard.table_name}-{shard.shard_id}" - else: - return f"{shard.table_name}-{shard.shard_id}-{shard.node_name}" + def _get_shard_compound_id(self, shard: ShardInfo) -> str: + # include schema to avoid collisions across schemas + if self.sort_by == "node": + return f"{shard.node_name}-{shard.schema_name}.{shard.table_name}-{shard.shard_id}" + return f"{shard.schema_name}.{shard.table_name}-{shard.shard_id}-{shard.node_name}"Please verify the presence of
schema_name
and update downstream usage as needed.
@@ -1,16 +1,19 @@ | |||
""" | |||
Shard analysis and rebalancing logic for CrateDB | |||
""" | |||
|
|||
import datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import to satisfy Ruff (F401)
datetime
is unused in this module version.
-import datetime
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import datetime |
🧰 Tools
🪛 Ruff (0.12.2)
4-4: datetime
imported but unused
Remove unused import: datetime
(F401)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 4, the module
currently imports datetime but does not use it; remove the unused import line
"import datetime" to satisfy the Ruff F401 lint rule and keep imports minimal.
from typing import Any, Dict, List, Optional, Set, Tuple, Union | ||
from unittest import result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused unittest.result
import (Ruff F401)
-from unittest import result
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from unittest import result |
🧰 Tools
🪛 Ruff (0.12.2)
10-10: unittest.result
imported but unused
Remove unused import: unittest.result
(F401)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 10, the import
"from unittest import result" is unused and flagged by Ruff F401; remove that
import line (or replace it with the correct needed import if the symbol was
intended to be used) so the file no longer imports an unused name.
self.table_filter: str|None = None | ||
self.sort_by: str = 'heat' | ||
|
||
def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3.9 compatibility: replace PEP 604 unions with Optional[...]
Pipeline fails at runtime when evaluating str | None
. Use Optional[str]
to support Python 3.9.
- self.table_filter: str|None = None
+ from typing import Optional # if not already imported at top
+ self.table_filter: Optional[str] = None
And update the method signature:
-def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'):
+def monitor_shards(
+ self,
+ table_filter: Optional[str],
+ interval_in_seconds: int = 5,
+ repeat: int = 10,
+ n_shards: int = 40,
+ sort_by: str = "heat",
+ watch: bool = False, # optional if implementing CLI watch flag
+):
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
853-853: Line too long (146 > 120)
(E501)
🪛 GitHub Actions: Tests: Common
[error] 853-853: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when evaluating annotation 'str | None' in ShardMonitor.monitor_shards. Python 3.9 does not support the PEP 604 union operator. This occurred during 'pytest -m cfr' collection. Upgrade to Python 3.10+ or replace with 'Optional[str]' / 'Union[str, None]'.
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 850 to 853, the
code uses PEP 604 union types (str | None) which break on Python 3.9; change all
occurrences to typing.Optional[str] (e.g., self.table_filter: Optional[str] =
None and def monitor_shards(self, table_filter: Optional[str], ...)), and ensure
you add/import "from typing import Optional" at the top of the file if not
already present.
with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards: | ||
while True: | ||
sleep(interval_in_seconds) | ||
self.refresh_data() | ||
live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas)) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Wrap long Rich Live line(s) to pass Ruff E501 and optionally drive watch
Two lines exceed 120 chars. Also, go_live
is hardcoded; consider using a watch
parameter from the CLI (optional).
- go_live = False
+ go_live = watch # drive from CLI; default False
if go_live:
- with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards:
+ with Live(
+ self.generate_shards_table(
+ self._get_top_shards(self.latest_shards, n_shards),
+ self.seq_deltas,
+ ),
+ refresh_per_second=4,
+ console=console,
+ ) as live_shards:
while True:
sleep(interval_in_seconds)
self.refresh_data()
- live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas))
+ live_shards.update(
+ self.generate_shards_table(
+ self._get_top_shards(self.latest_shards, n_shards),
+ self.seq_deltas,
+ )
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
with Live(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas), refresh_per_second=4, console=console) as live_shards: | |
while True: | |
sleep(interval_in_seconds) | |
self.refresh_data() | |
live_shards.update(self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas)) | |
else: | |
go_live = watch # drive from CLI; default False | |
if go_live: | |
with Live( | |
self.generate_shards_table( | |
self._get_top_shards(self.latest_shards, n_shards), | |
self.seq_deltas, | |
), | |
refresh_per_second=4, | |
console=console, | |
) as live_shards: | |
while True: | |
sleep(interval_in_seconds) | |
self.refresh_data() | |
live_shards.update( | |
self.generate_shards_table( | |
self._get_top_shards(self.latest_shards, n_shards), | |
self.seq_deltas, | |
) | |
) | |
else: | |
# existing non-live rendering path |
🧰 Tools
🪛 Ruff (0.12.2)
864-864: Line too long (173 > 120)
(E501)
868-868: Line too long (135 > 120)
(E501)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 864 to 869, two
lines calling Live(..., refresh_per_second=4, console=console) and
live_shards.update(...) exceed 120 characters; wrap the long Rich Live
invocation and the update call to multiple shorter concatenated/indented lines
to satisfy Ruff E501 (<=120 chars). Also replace the hardcoded go_live/watch
behavior by wiring a CLI parameter (e.g., watch or go_live) into this branch and
use that parameter instead of a literal so the visibility loop is controlled
externally.
str(self.reference_shards[shard_compound_id].seq_stats_max_seq_no), | ||
str(shard.seq_stats_max_seq_no) | ||
) | ||
console.print(shards_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid double-printing table (UI prints twice per iteration)
display_shards_table_rows
prints the table, and then the caller prints it again. Remove the internal print to avoid duplicate output.
- console.print(shards_table)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
console.print(shards_table) |
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 934, the function
display_shards_table_rows is printing the shards_table internally and the caller
also prints it, causing duplicate UI output; remove the internal
console.print(shards_table) call (or change the function to only return the
table and let the caller handle printing) so the table is printed exactly once
by the caller.
refreshed_number = shard.seq_stats_max_seq_no | ||
reference = reference_shards[shard_compound_id].seq_stats_max_seq_no | ||
|
||
if refreshed_number < reference: | ||
refreshed_number += 2 ** 63 - 1 | ||
|
||
seq_result[shard_compound_id] = refreshed_number - reference | ||
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Defensive handling of None seq-no and rollover magic constant
- Coalesce
seq_stats_max_seq_no
to 0 to avoid TypeError when the DB returns NULLs (e.g., non-STARTED shards). - The rollover adjustment uses
2 ** 63 - 1
. Please confirm this matches CrateDB’s seq_no range; otherwise compute delta against a modulo base or reset reference.
- refreshed_number = shard.seq_stats_max_seq_no
- reference = reference_shards[shard_compound_id].seq_stats_max_seq_no
+ refreshed_number = int(shard.seq_stats_max_seq_no or 0)
+ reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0)
- if refreshed_number < reference:
- refreshed_number += 2 ** 63 - 1
+ # Handle rollover: if sequence decreased, assume wraparound.
+ # TODO: confirm expected max range for seq_no in CrateDB.
+ if refreshed_number < reference:
+ refreshed_number += (2**63 - 1)
🏁 Script executed:
#!/bin/bash
# Locate references to seq_stats_max_seq_no and show context
rg -nH -C3 'seq_stats_max_seq_no' .
Length of output: 2745
Improve seq-no handling: coalesce NULLs → 0 & correct wrap-around delta logic
To avoid runtime errors when seq_stats_max_seq_no
is NULL (e.g. non-STARTED shards) and ensure the rollover fixup uses the full 64-bit modulus, please apply the following changes:
• In cratedb_toolkit/admin/xmover/util/database.py (row[12]) and the corresponding model, coalesce the DB value to 0 so downstream code never sees None
and TypeErrors from comparisons or arithmetic.
• In cratedb_toolkit/admin/xmover/analysis/shard.py (lines 954–961), update the seq-no math to:
- Cast to
int
withor 0
defaults - On wrap-around (current < reference), add
1 << 64
(the full unsigned 64-bit modulus), or use modular arithmetic ((refreshed - reference) % (1 << 64)
) to get the correct delta.
Proposed diff:
- refreshed_number = shard.seq_stats_max_seq_no
- reference = reference_shards[shard_compound_id].seq_stats_max_seq_no
+ refreshed_number = int(shard.seq_stats_max_seq_no or 0)
+ reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0)
- if refreshed_number < reference:
- refreshed_number += 2 ** 63 - 1
+ if refreshed_number < reference:
+ # Handle signed-long overflow: wrap around using full 64-bit modulus
+ refreshed_number += 1 << 64
- seq_result[shard_compound_id] = refreshed_number - reference
+ # Or simply: seq_result[shard_compound_id] = (refreshed_number - reference) % (1 << 64)
+ seq_result[shard_compound_id] = refreshed_number - reference
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
refreshed_number = shard.seq_stats_max_seq_no | |
reference = reference_shards[shard_compound_id].seq_stats_max_seq_no | |
if refreshed_number < reference: | |
refreshed_number += 2 ** 63 - 1 | |
seq_result[shard_compound_id] = refreshed_number - reference | |
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb | |
refreshed_number = int(shard.seq_stats_max_seq_no or 0) | |
reference = int(reference_shards[shard_compound_id].seq_stats_max_seq_no or 0) | |
if refreshed_number < reference: | |
# Handle signed-long overflow: wrap around using full 64-bit modulus | |
refreshed_number += 1 << 64 | |
# Or simply: seq_result[shard_compound_id] = (refreshed_number - reference) % (1 << 64) | |
seq_result[shard_compound_id] = refreshed_number - reference | |
size_result[shard_compound_id] = shard.size_gb - reference_shards[shard_compound_id].size_gb |
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 954–961 and
cratedb_toolkit/admin/xmover/util/database.py (row[12]) the code can encounter
NULL seq_stats_max_seq_no and uses an incorrect wrap-around fixup; coalesce DB
seq_no to 0 in database.py (and update the model) so callers never see None, and
in shard.py cast seq values to int with a default of 0 and compute the delta
using full 64-bit modular arithmetic (e.g. (refreshed - reference) % (1 << 64))
instead of adding 2**63 - 1 when refreshed < reference so wrap-around is handled
correctly.
def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]: | ||
if n_shards < 1: | ||
return sorted_shards[:n_shards] | ||
else: | ||
return sorted_shards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Top-N bug: slices applied only when n_shards < 1
This logic shows all shards for positive N and slices only for negative values. Invert the condition so positive n_shards
returns the top-N.
[uggest_essential_refactor]
- def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
- if n_shards < 1:
- return sorted_shards[:n_shards]
- else:
- return sorted_shards
+ def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]:
+ if n_shards > 0:
+ return sorted_shards[:n_shards]
+ return sorted_shards
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]: | |
if n_shards < 1: | |
return sorted_shards[:n_shards] | |
else: | |
return sorted_shards | |
def _get_top_shards(self, sorted_shards: list[ShardInfo], n_shards: int) -> list[ShardInfo]: | |
if n_shards > 0: | |
return sorted_shards[:n_shards] | |
return sorted_shards |
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 977 to 981, the
condition is inverted so slices are applied only when n_shards is negative;
change the logic to apply slicing when n_shards is positive: if n_shards > 0
return sorted_shards[:n_shards], else return sorted_shards (or return an empty
list if you want n_shards == 0 to yield no shards).
cratedb_toolkit/admin/xmover/cli.py
Outdated
@main.command() | ||
@click.option("--table", "-t", default=None, help="Analyze specific table only") | ||
@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)") | ||
@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)") | ||
@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)") | ||
@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)") | ||
@click.pass_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix Ruff E501: wrap long Click option lines and validate --sort-by
choices
The three @click.option
lines exceed the 120-char limit and break the pipeline. Also, constrain --sort-by
to the supported values to catch typos early.
Apply this diff to wrap and validate:
-@main.command()
-@click.option("--table", "-t", default=None, help="Analyze specific table only")
-@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)")
-@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)")
-@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)")
-@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)")
+@main.command(name="monitor-shards")
+@click.option("--table", "-t", default=None, help="Analyze specific table only")
+@click.option(
+ "--wait-time",
+ "-w",
+ default=10,
+ help=(
+ "Seconds to wait between checks. "
+ "Longer waits yield more accurate deltas (default: 10)."
+ ),
+)
+@click.option(
+ "--repeat",
+ "-r",
+ default=1,
+ help=(
+ "Number of iterations to run. Use -1 for continuous monitoring "
+ "(default: 1)."
+ ),
+)
+@click.option(
+ "--max-results",
+ "-m",
+ default=40,
+ help="Maximum number of shards to display (default: 40)",
+)
+@click.option(
+ "--sort-by",
+ "-s",
+ type=click.Choice(["heat", "node", "table"]),
+ default="heat",
+ help="Sort by heat, node, or table (default: heat)",
+)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@main.command() | |
@click.option("--table", "-t", default=None, help="Analyze specific table only") | |
@click.option("--wait-time", "-w", default=10, help="The number of seconds to wait before checking the shards again. The more the wait the more accurate the results will be (default: 10)") | |
@click.option("--repeat", "-r", default=1, help="The number of times the shards will be checked. The more times the more accurate the results will be. Use -1 for continuous check (default 1)") | |
@click.option("--max-results", "-m", default=40, help="The number of shards that will be displayed (default: 40)") | |
@click.option("--sort-by", "-s", default="heat", help="How the shard table is sorted. Valid values are heat, node or table (default: heat)") | |
@click.pass_context | |
@main.command(name="monitor-shards") | |
@click.option("--table", "-t", default=None, help="Analyze specific table only") | |
@click.option( | |
"--wait-time", | |
"-w", | |
default=10, | |
help=( | |
"Seconds to wait between checks. " | |
"Longer waits yield more accurate deltas (default: 10)." | |
), | |
) | |
@click.option( | |
"--repeat", | |
"-r", | |
default=1, | |
help=( | |
"Number of iterations to run. Use -1 for continuous monitoring " | |
"(default: 1)." | |
), | |
) | |
@click.option( | |
"--max-results", | |
"-m", | |
default=40, | |
help="Maximum number of shards to display (default: 40)", | |
) | |
@click.option( | |
"--sort-by", | |
"-s", | |
type=click.Choice(["heat", "node", "table"]), | |
default="heat", | |
help="Sort by heat, node, or table (default: heat)", | |
) | |
@click.pass_context |
🧰 Tools
🪛 Ruff (0.12.2)
67-67: Line too long (188 > 120)
(E501)
68-68: Line too long (192 > 120)
(E501)
70-70: Line too long (140 > 120)
(E501)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/cli.py around lines 65 to 71, several
@click.option decorators exceed the 120-char limit and the --sort-by option is
not constrained; wrap the long option lines to keep them under 120 characters
(split arguments across lines so each line is <=120 chars) and change the
--sort-by option to use click.Choice(["heat","node","table"]) (with an
appropriate default of "heat") so invalid values are rejected early; keep other
option defaults and help text the same.
s.routing_state, | ||
s.seq_no_stats['max_seq_no'], | ||
s.seq_no_stats['global_checkpoint'], | ||
s.seq_no_stats['local_checkpoint'] | ||
FROM sys.shards s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Seq-no stats can be NULL for non-STARTED shards; coalesce at the source
When for_analysis=True
, non-STARTED shards may return NULLs for seq_no_stats. Downstream code does arithmetic on these fields and will crash. Coalesce to 0 in SQL to guarantee numeric values.
- s.routing_state,
- s.seq_no_stats['max_seq_no'],
- s.seq_no_stats['global_checkpoint'],
- s.seq_no_stats['local_checkpoint']
+ s.routing_state,
+ COALESCE(s.seq_no_stats['max_seq_no'], 0),
+ COALESCE(s.seq_no_stats['global_checkpoint'], 0),
+ COALESCE(s.seq_no_stats['local_checkpoint'], 0)
🏁 Script executed:
#!/bin/bash
set -eux
# Locate the target file
file=$(find . -type f -path "*cratedb_toolkit/admin/xmover/util/database.py" -print -quit)
echo "Inspecting $file"
# Display context around the SQL snippet
sed -n '130,200p' "$file"
# Find where for_analysis parameter is referenced in this file
grep -n "for_analysis" "$file"
# Search the entire repo for usage of max_seq_no to locate downstream arithmetic
grep -R "max_seq_no" -n .
Length of output: 4061
🏁 Script executed:
#!/bin/bash
set -eux
# Display the function definition containing the seq_no_stats query
file="./cratedb_toolkit/admin/xmover/util/database.py"
echo "===== Part of database.py (lines 1-140) ====="
sed -n '1,140p' "$file"
# Display the ShardInfo model to see how seq_stats_* fields are defined
echo "===== Part of model.py (lines 1-200) ====="
sed -n '1,200p' cratedb_toolkit/admin/xmover/model.py
Length of output: 9969
Coalesce seq_no_stats at the source to avoid NULLs for non-STARTED shards
When for_analysis=True
, the SQL will include shards in states other than STARTED, and s.seq_no_stats[...]
can be NULL. Since downstream code assigns these directly to integer fields and performs arithmetic, those NULLs will cause runtime errors. Coalesce to 0 in the SELECT to guarantee non-null numeric values.
• File: cratedb_toolkit/admin/xmover/util/database.py
Lines: 146–150 (in the get_shards_info
query)
Suggested diff:
s.routing_state,
- s.seq_no_stats['max_seq_no'],
- s.seq_no_stats['global_checkpoint'],
- s.seq_no_stats['local_checkpoint']
+ COALESCE(s.seq_no_stats['max_seq_no'], 0) AS max_seq_no,
+ COALESCE(s.seq_no_stats['global_checkpoint'], 0) AS global_checkpoint,
+ COALESCE(s.seq_no_stats['local_checkpoint'], 0) AS local_checkpoint
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
s.routing_state, | |
s.seq_no_stats['max_seq_no'], | |
s.seq_no_stats['global_checkpoint'], | |
s.seq_no_stats['local_checkpoint'] | |
FROM sys.shards s | |
s.routing_state, | |
COALESCE(s.seq_no_stats['max_seq_no'], 0) AS max_seq_no, | |
COALESCE(s.seq_no_stats['global_checkpoint'], 0) AS global_checkpoint, | |
COALESCE(s.seq_no_stats['local_checkpoint'], 0) AS local_checkpoint | |
FROM sys.shards s |
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/util/database.py around lines 146–150, the
SELECT uses s.seq_no_stats['max_seq_no'], s.seq_no_stats['global_checkpoint'],
and s.seq_no_stats['local_checkpoint'] directly which can be NULL when
for_analysis=True (non-STARTED shards); update the query to COALESCE each of
those expressions to 0 in the SELECT so the returned values are guaranteed
non-null integers and downstream arithmetic/assignments won’t error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (8)
cratedb_toolkit/admin/xmover/analysis/shard.py (8)
4-4
: Remove unused import to fix Ruff F401The
datetime
import is unused and flagged by static analysis.
8-8
: Remove unused import to fix Ruff F401The
sleep
import fromtime
is unused and flagged by static analysis.
10-10
: Remove unused import to fix Ruff F401The
unittest.result
import is unused and flagged by static analysis.
922-922
: Remove duplicate table printingThe function prints the table internally and the caller also prints it, causing duplicate output.
942-949
: Fix potential None handling and rollover arithmeticThe code doesn't handle potential None values for seq_stats_max_seq_no and uses incorrect rollover arithmetic.
965-969
: Fix inverted logic for top-N shard filteringThe condition is inverted - it shows all shards for positive N and slices only for negative values.
853-853
: Fix Python 3.9 compatibility and line lengthThe method signature uses PEP 604 union syntax and exceeds line length limits.
Apply this diff to fix both issues:
- def monitor_shards(self, table_filter: str | None, interval_in_seconds: int = 5, repeat: int = 10, n_shards: int = 40, sort_by: str = 'heat'): + def monitor_shards( + self, + table_filter: Optional[str], + interval_in_seconds: int = 5, + repeat: int = 10, + n_shards: int = 40, + sort_by: str = 'heat', + ):
862-874
: Implement proper watch functionality and fix infinite loopThe current implementation has several issues:
- Infinite loop without proper exit condition for non-watch mode
- Missing watch parameter functionality
- Line length violations
Apply this diff to fix the monitoring logic:
iterations = 0 + go_live = False # TODO: wire from CLI watch parameter while True: sleep(interval_in_seconds) self.refresh_data() - shards_table = self.generate_shards_table(self._get_top_shards(self.latest_shards, n_shards), self.seq_deltas) + shards_table = self.generate_shards_table( + self._get_top_shards(self.latest_shards, n_shards), + self.seq_deltas + ) console.print(shards_table) - nodes_table = self.generate_nodes_table(self._get_nodes_heat_info(self.reference_shards, self.seq_deltas)) + nodes_table = self.generate_nodes_table( + self._get_nodes_heat_info(self.reference_shards, self.seq_deltas) + ) console.print(nodes_table) iterations += 1 if 0 < repeat <= iterations: break
🧹 Nitpick comments (2)
cratedb_toolkit/admin/xmover/analysis/shard.py (2)
956-956
: Fix line length violationThe line exceeds the 120-character limit.
Apply this diff to wrap the long line:
- updated_shards: list[ShardInfo] = [s for s in self.analyzer.shards if not self.table_filter or self.table_filter == s.table_name] + updated_shards: list[ShardInfo] = [ + s for s in self.analyzer.shards + if not self.table_filter or self.table_filter == s.table_name + ]
958-962
: Fix line length violations in sorting logicMultiple lines exceed the 120-character limit.
Apply this diff to wrap the long lines:
if self.sort_by == 'heat': - self.latest_shards = sorted(updated_shards, key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)], - reverse=True) + self.latest_shards = sorted( + updated_shards, + key=lambda s: self.seq_deltas[self._get_shard_compound_id(s)], + reverse=True + ) else: - self.latest_shards = sorted(updated_shards, key=lambda s: self._get_shard_compound_id(s)) + self.latest_shards = sorted( + updated_shards, + key=lambda s: self._get_shard_compound_id(s) + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
cratedb_toolkit/admin/xmover/analysis/shard.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cratedb_toolkit/admin/xmover/analysis/shard.py (3)
cratedb_toolkit/admin/xmover/model.py (2)
ShardInfo
(33-54)size_gb
(84-86)cratedb_toolkit/admin/xmover/cli.py (1)
monitor_shards
(72-77)cratedb_toolkit/admin/xmover/util/format.py (1)
format_size
(1-8)
🪛 Ruff (0.12.2)
cratedb_toolkit/admin/xmover/analysis/shard.py
4-4: datetime
imported but unused
Remove unused import: datetime
(F401)
10-10: unittest.result
imported but unused
Remove unused import: unittest.result
(F401)
16-16: rich.live.Live
imported but unused
Remove unused import: rich.live.Live
(F401)
853-853: Line too long (146 > 120)
(E501)
866-866: Line too long (122 > 120)
(E501)
956-956: Line too long (137 > 120)
(E501)
🪛 GitHub Actions: Tests: Common
cratedb_toolkit/admin/xmover/analysis/shard.py
[error] 853-853: pytest -m pymongo failed during collection: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' in ShardMonitor.monitor_shards signature. This uses Python 3.10+'s PEP 604 union syntax ('str | None'), not supported on Python 3.9. Replace with Optional[str] or upgrade Python.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-and-publish
- GitHub Check: build-and-publish
|
||
from rich import box | ||
from rich.console import Console | ||
from rich.panel import Panel | ||
from rich.table import Table | ||
from rich.live import Live |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import to fix Ruff F401
The Live
import from rich.live
is unused and flagged by static analysis.
-from rich.live import Live
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from rich.live import Live | |
-from rich.live import Live |
🧰 Tools
🪛 Ruff (0.12.2)
16-16: rich.live.Live
imported but unused
Remove unused import: rich.live.Live
(F401)
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around line 16 the import
"from rich.live import Live" is unused and triggers Ruff F401; remove that
import statement (or, if Live is intended to be used later, use it where
required) so the unused symbol is gone and re-run linters to confirm the warning
is resolved.
class ShardMonitor: | ||
def __init__(self, analyzer: ShardAnalyzer): | ||
self.analyzer = analyzer | ||
self.reference_shards: dict[str, ShardInfo] | ||
self.latest_shards: list[ShardInfo] | ||
self.seq_deltas: dict[str, int] | ||
self.size_deltas: dict[str, float] | ||
|
||
self.table_filter: str|None = None | ||
self.sort_by: str = 'heat' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Python 3.9 compatibility and improve type annotations
The code uses PEP 604 union syntax (str | None
) which breaks on Python 3.9 and has missing type annotations for instance variables.
Apply this diff to fix compatibility and type annotations:
+from typing import Optional
+
class ShardMonitor:
def __init__(self, analyzer: ShardAnalyzer):
self.analyzer = analyzer
- self.reference_shards: dict[str, ShardInfo]
- self.latest_shards: list[ShardInfo]
- self.seq_deltas: dict[str, int]
- self.size_deltas: dict[str, float]
+ self.reference_shards: dict[str, ShardInfo] = {}
+ self.latest_shards: list[ShardInfo] = []
+ self.seq_deltas: dict[str, int] = {}
+ self.size_deltas: dict[str, float] = {}
- self.table_filter: str|None = None
+ self.table_filter: Optional[str] = None
self.sort_by: str = 'heat'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class ShardMonitor: | |
def __init__(self, analyzer: ShardAnalyzer): | |
self.analyzer = analyzer | |
self.reference_shards: dict[str, ShardInfo] | |
self.latest_shards: list[ShardInfo] | |
self.seq_deltas: dict[str, int] | |
self.size_deltas: dict[str, float] | |
self.table_filter: str|None = None | |
self.sort_by: str = 'heat' | |
from typing import Optional | |
class ShardMonitor: | |
def __init__(self, analyzer: ShardAnalyzer): | |
self.analyzer = analyzer | |
self.reference_shards: dict[str, ShardInfo] = {} | |
self.latest_shards: list[ShardInfo] = [] | |
self.seq_deltas: dict[str, int] = {} | |
self.size_deltas: dict[str, float] = {} | |
self.table_filter: Optional[str] = None | |
self.sort_by: str = 'heat' |
🤖 Prompt for AI Agents
In cratedb_toolkit/admin/xmover/analysis/shard.py around lines 842-852, the
class uses PEP 604 union syntax and bare built-in generics which break on Python
3.9 and leaves instance attributes uninitialized/unannotated; change type hints
to use typing imports (from typing import Dict, List, Optional) and
annotate/initialize the attributes, e.g. reference_shards: Dict[str, ShardInfo]
= {} , latest_shards: List[ShardInfo] = [], seq_deltas: Dict[str, int] = {},
size_deltas: Dict[str, float] = {}, and table_filter: Optional[str] = None,
keeping sort_by: str = 'heat'. Ensure typing imports are added at top of file.
2d98ae3
to
92e1d81
Compare
dac2a6d
to
cb05b4d
Compare
About
At GH-526, we are thinking about next steps in user ergonomics when debugging a busy cluster. Based on the idea of computing deltas of certain metrics available through CrateDB's system tables about shards, @juanpardo starts exploring a few ideas here. Thanks!
Install
uv pip install --upgrade 'cratedb-toolkit @ git+https://github.com/crate/cratedb-toolkit.git@jp/xmover-shard-hot-monitor'
Details
Static reports are always topmost priority. In an optional variant, a
--watch
-like interval-updating view could also make sense. However, it is difficult to learn about historic progress/changes in this case, i.e. when leaving the screen for a while, so discrete output is always preferred as the primary operation mode.