Skip to content

feat(redshift-mcp-server): add describe_execution_plan tool#1927

Closed
saeedma8 wants to merge 0 commit into
awslabs:mainfrom
saeedma8:main
Closed

feat(redshift-mcp-server): add describe_execution_plan tool#1927
saeedma8 wants to merge 0 commit into
awslabs:mainfrom
saeedma8:main

Conversation

@saeedma8
Copy link
Copy Markdown

@saeedma8 saeedma8 commented Dec 10, 2025

Fixes #1019

Summary

Added describe_execution_plan tool to the Redshift MCP. This tool uses Redshift's EXPLAIN VERBOSE command to explain how a query would be executed, providing insights into query performance without actually running the query.

Changes

  • Added ExecutionPlan class to the models file
  • Added describe_execution_plan to the redshift and server files
  • Added unit tests
  • Updated README

User experience

User can get execution plan and execution verbose plan by using this tool.

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • I have reviewed the contributing guidelines
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Is this a breaking change? (N)

RFC issue number: #1019

Checklist:

  • [N/A] Migration process documented
  • [N/A] Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Testing explain only

 % kiro-cli chat --no-interactive \
--trust-tools "@{awslabsredshift-mcp-server}/*,execute_bash,fs_read" \
"only use describe execution plan in redshift mcp server to describe the execution plan for three queries from sample data in redshift-cluster-1 in us-east-1; 1. retrieve the top 3 users who spent the most 2. least popular product 3. top 5 most popular products, suggest any plan optimization"

 > Let me first discover the database structure in redshift-cluster-1 to write accurate queries.
Running tool list_databases with the param (from mcp server: awslabsredshift-mcp-server)
 ⋮  {
 ⋮    "cluster_identifier": "redshift-cluster-1"
 ⋮  }
 - Completed in 9.111s
 
 
 ...
 
 > I'll use the TICKIT schema for the "users who spent the most" query (it has users and sales tables), and the TPCDS schema for the product popularity queries (it has item and store_sales tables). Let me check the relevant column structures to write accurate queries.

...
 
> Now I have the full schema picture. Here's the summary:

- **tickit.users**: DISTKEY=userid, SORTKEY=userid — join key for buyers
- **tickit.sales**: DISTKEY=listid, SORTKEY=dateid — buyerid references users, pricepaid is the spend amount
- **tpcds.store_sales**: DISTKEY=ss_item_sk, SORTKEY=ss_sold_date_sk — ss_quantity for popularity
- **tpcds.item**: DISTKEY=i_item_sk, SORTKEY=i_category — i_product_name for product names

Now let me run all three execution plans in parallel:


Running tool describe_execution_plan with the param (from mcp server: awslabsredshift-mcp-server)
 ⋮  {
 ⋮    "cluster_identifier": "redshift-cluster-1",
 ⋮    "database_name": "sample_data_dev",
 ⋮    "sql": "SELECT u.userid, u.firstname, u.lastname, SUM(s.pricepaid) AS total_spent FROM tickit.users u JOIN tickit.sales s ON u.userid = s.buyerid GROUP BY u.userid, u.firstname, u.lastname ORDER BY total_spent DESC LIMIT 3"
 ⋮  }
Running tool describe_execution_plan with the param (from mcp server: awslabsredshift-mcp-server)
 ⋮  {
 ⋮    "cluster_identifier": "redshift-cluster-1",
 ⋮    "database_name": "sample_data_dev",
 ⋮    "sql": "SELECT i.i_item_sk, i.i_product_name, SUM(ss.ss_quantity) AS total_qty_sold FROM tpcds.store_sales ss JOIN tpcds.item i ON ss.ss_item_sk = i.i_item_sk GROUP BY i.i_item_sk, i.i_product_name ORDER BY total_qty_sold ASC LIMIT 1"
 ⋮  }
Running tool describe_execution_plan with the param (from mcp server: awslabsredshift-mcp-server)
 ⋮  {
 ⋮    "cluster_identifier": "redshift-cluster-1",
 ⋮    "database_name": "sample_data_dev",
 ⋮    "sql": "SELECT i.i_item_sk, i.i_product_name, SUM(ss.ss_quantity) AS total_qty_sold FROM tpcds.store_sales ss JOIN tpcds.item i ON ss.ss_item_sk = i.i_item_sk GROUP BY i.i_item_sk, i.i_product_name ORDER BY total_qty_sold DESC LIMIT 5"
 ⋮  }
 - Completed in 39.909s

 - Completed in 37.422s

 - Completed in 38.19s

> Here's the full analysis of all three execution plans:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## Query 1: Top 3 Users Who Spent the Most

sql
SELECT u.userid, u.firstname, u.lastname, SUM(s.pricepaid) AS total_spent
FROM tickit.users u JOIN tickit.sales s ON u.userid = s.buyerid
GROUP BY u.userid, u.firstname, u.lastname
ORDER BY total_spent DESC LIMIT 3


### Plan Tree (bottom-up)
XN Limit  (rows=3)
└─ XN Merge  (rows=48,513)
   └─ XN Network  (Send to leader)
      └─ XN Sort  (Sort Key: sum(s.pricepaid))
         └─ XN HashAggregate  (rows=48,513)
            └─ XN Hash Join DS_BCAST_INNER  (rows=355,414)
               ├─ XN Seq Scan on sales s  (rows=344,912, cost=3,449)
               └─ XN Hash
                  └─ XN Seq Scan on users u  (rows=49,990, cost=500)


### Key Issue: DS_BCAST_INNER on the join
- sales DISTKEY = listid, users DISTKEY = userid
- The join is on buyerid = userid — neither matches the DISTKEY of sales, so the users table (49K rows) is broadcast to all nodes
- Total join cost: ~16 billion (dominated by the broadcast + aggregate)

### Table Design
| Table | Rows | DISTKEY | SORTKEY |
|-------|------|---------|---------|
| tickit.sales | 344,912 | listid | dateid |
| tickit.users | 49,990 | userid | userid |

### Column Stats
- buyerid correlation: 0.05 (essentially random on disk)
- pricepaid correlation: 0.005 (random)
- userid correlation: 0.95 (well-ordered — good for the users table)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## Query 2: Least Popular Product (LIMIT 1)

sql
SELECT i.i_item_sk, i.i_product_name, SUM(ss.ss_quantity) AS total_qty_sold
FROM tpcds.store_sales ss JOIN tpcds.item i ON ss.ss_item_sk = i.i_item_sk
GROUP BY i.i_item_sk, i.i_product_name
ORDER BY total_qty_sold ASC LIMIT 1


### Plan Tree (bottom-up)
XN Limit  (rows=1)
└─ XN Merge  (rows=18,000)
   └─ XN Network  (Send to leader)
      └─ XN Sort  (Sort Key: sum(ss.ss_quantity))  ← DS_DIST_NONE
         └─ XN HashAggregate  (rows=18,000)  ← DS_DIST_NONE
            └─ XN Hash Join DS_DIST_NONE  (rows=2,880,404)
               ├─ XN Seq Scan on store_sales ss  (rows=2,880,404, cost=28,804)
               └─ XN Hash
                  └─ XN Seq Scan on item i  (rows=18,000, cost=180)


### Key Observation: DS_DIST_NONE — no data movement!
- store_sales DISTKEY = ss_item_sk, item DISTKEY = i_item_sk
- The join on ss_item_sk = i_item_sk is co-located — both tables are distributed on the join key
- This is the optimal distribution for this join pattern

### Table Design
| Table | Rows | DISTKEY | SORTKEY |
|-------|------|---------|---------|
| tpcds.store_sales | 2,880,404 | ss_item_sk | ss_sold_date_sk |
| tpcds.item | 18,000 | i_item_sk | i_category |

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## Query 3: Top 5 Most Popular Products

sql
SELECT i.i_item_sk, i.i_product_name, SUM(ss.ss_quantity) AS total_qty_sold
FROM tpcds.store_sales ss JOIN tpcds.item i ON ss.ss_item_sk = i.i_item_sk
GROUP BY i.i_item_sk, i.i_product_name
ORDER BY total_qty_sold DESC LIMIT 5


### Plan Tree
Identical to Query 2 — same tables, same join, just different sort direction and LIMIT. Also achieves DS_DIST_NONE.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## Optimization Recommendations

### 1. Query 1 (Top Spenders) — High Impact

The biggest issue is the DS_BCAST_INNER on the users/sales join. Two options:

- **Option A: Change tickit.users to DISTSTYLE ALL** — Since it's only 49K rows, replicating it to all nodes eliminates the broadcast entirely. This is the simplest fix.
 sql
  CREATE TABLE tickit.users_new DISTSTYLE ALL SORTKEY(userid) AS SELECT * FROM tickit.users;


- **Option B: Change tickit.sales DISTKEY to buyerid** — This co-locates the join but may hurt other queries that join on listid.

### 2. Compression (All Queries)

- tickit.sales.dateid — SORTKEY with ENCODE none. Add ENCODE az64 to save storage/I/O.
- tickit.users — 11 boolean columns have ENCODE none. Use ENCODE AUTO to compress them.
- tpcds.store_sales.ss_sold_date_sk — SORTKEY with ENCODE none. Add compression.
- tpcds.item.i_category — SORTKEY with ENCODE none, 54 bytes avg width. Compressing this wide column would significantly reduce I/O.

### 3. Item Table Distribution (Queries 2 & 3)

The tpcds.item table (18K rows) already achieves DS_DIST_NONE because its DISTKEY matches the join key. However, the rule-based engine suggests changing it to DISTSTYLE ALL — this would be beneficial if you also join item on other keys in different queries. For these
specific queries, the current design is already optimal.

### 4. Sort Cost Dominance

In all three queries, the Sort step dominates the cost (the 1000000... prefix indicates a sort-heavy plan). This is expected for ORDER BY + LIMIT patterns on aggregated results. No optimization needed here — Redshift handles this efficiently with its merge-sort across
nodes.

### Summary

| Query | Join Distribution | Main Bottleneck | Fix |
|-------|------------------|-----------------|-----|
| Top 3 spenders | DS_BCAST_INNER ⚠️ | Users broadcast to all nodes | Change users to DISTSTYLE ALL |
| Least popular product | DS_DIST_NONE ✅ | Full scan of 2.8M rows (expected) | Already optimal |
| Top 5 popular products | DS_DIST_NONE ✅ | Full scan of 2.8M rows (expected) | Already optimal |

 ▸ Credits: 1.74 • Time: 4m 7s

E2E Testing

% echo 'Test if all the available to you Redshift Mcp Server tools are working. Check both provisioned and serverless clusters, including the database schema exploration in both. Check the SQL read-only protection, transaction breaker protection, failed user SQL behavior. Test new tool describe_execution_plan and return the explain result. Get the test scenario ideas from the unit tests under the project directory. Provide a short testing summary, one line per tool.' > /tmp/test_request.txt

% kiro-cli chat --no-interactive \
--trust-tools "@{awslabsredshift-mcp-server}/*,execute_bash,fs_read" \
"$(cat /tmp/test_request.txt)"

> Let me look at the key test scenarios from the unit tests, particularly around protected statements, read-only mode, and transaction breaker:

...


> All tests complete. Here's the full testing summary:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## Testing Summary

All 7 Redshift MCP Server tools tested against both provisioned (redshift-cluster-1) and serverless (redshift-cluster-serverless) clusters.

| Tool | Result | Details |
|---|---|---|
| list_clusters | ✅ PASS | Discovered 5 clusters: 3 provisioned (1 paused), 2 serverless — returns type, status, endpoint, node config, encryption |
| list_databases | ✅ PASS | Provisioned returned 5 databases (dev, lab, mydb, mydb3, sample_data_dev); serverless returned 1 (dev) — includes owner, type, ACL, isolation level |
| list_schemas | ✅ PASS | Provisioned/sample_data_dev returned 6 schemas (information_schema, pg_catalog, public, tickit, tpcds, tpch); serverless/dev returned 3 — includes schema type, ACL, owner |
| list_tables | ✅ PASS | Provisioned/tickit returned 7 tables with diststyle, row counts, DML stats; serverless/public returned 2 tables with AUTO(ALL) diststyle |
| list_columns | ✅ PASS | Provisioned/tickit.sales returned 10 columns with distkey (listid), sortkey (dateid), encoding info; serverless/admin_table returned 2 columns with az64/lzo encoding |
| execute_query | ✅ PASS | Valid SELECT returned correct data on both provisioned (3 sales rows) and serverless (2 admin rows) with execution time and query ID |
| describe_execution_plan | ✅ PASS | Provisioned: returned 8-node plan with Hash Join DS_BCAST_INNER, table designs with column planner stats (n_distinct, correlation, most_common_vals), and 15 rule-based suggestions; serverless: returned simple Seq Scan plan with 1
suggestion |

### Protection Tests

| Test | Result | Details |
|---|---|---|
| Read-only protection (DROP TABLE) | ✅ BLOCKED | "ERROR: transaction is read-only" — on both provisioned and serverless |
| Read-only protection (DELETE) | ✅ BLOCKED | "ERROR: transaction is read-only" — confirmed on serverless |
| Transaction breaker (END; DROP) | ✅ BLOCKED | "SQL contains suspicious pattern, execution rejected" — caught before reaching the database |
| Failed user SQL (nonexistent table) | ✅ PROPER ERROR | "ERROR: relation \"tickit.nonexistent_table\" does not exist" — clean error propagation |

### One-Line Per Tool Summary

- list_clusters — ✅ Discovers both provisioned and serverless clusters with full metadata including paused status detection.
- list_databases — ✅ Lists databases on both cluster types with ACL, isolation level, and local/shared type info.
- list_schemas — ✅ Returns schemas with type classification (local/external/shared) and permissions on both cluster types.
- list_tables — ✅ Returns tables with Redshift-specific metadata (diststyle, row counts, DML stats) on both provisioned and serverless.
- list_columns — ✅ Returns columns with encoding, distkey/sortkey positions; planner stats populated only via describe_execution_plan.
- execute_query — ✅ Executes SELECT safely in READ ONLY transactions; blocks DROP/DELETE with proper errors; rejects transaction-breaking SQL patterns before execution.
- describe_execution_plan — ✅ Returns structured plan nodes, table designs with column planner statistics (correlation, n_distinct, most_common_vals), human-readable EXPLAIN output, and rule-based optimization suggestions.

 ▸ Credits: 1.58 • Time: 4m 15s



@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 94.36090% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.60%. Comparing base (63c4501) to head (836d2ba).
⚠️ Report is 73 commits behind head on main.

Files with missing lines Patch % Lines
...mcp-server/awslabs/redshift_mcp_server/redshift.py 93.24% 4 Missing and 26 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1927      +/-   ##
==========================================
+ Coverage   91.40%   92.60%   +1.19%     
==========================================
  Files        1017      911     -106     
  Lines       75524    73452    -2072     
  Branches    12186    11721     -465     
==========================================
- Hits        69036    68020    -1016     
+ Misses       3997     3298     -699     
+ Partials     2491     2134     -357     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@saeedma8 saeedma8 changed the title Added get_execution_plan tool to Redshift MCP feat(redshift-mcp-server): add get_execution_plan tool Dec 10, 2025
@grayhemp
Copy link
Copy Markdown
Contributor

Looking...

@saeedma8 saeedma8 force-pushed the main branch 5 times, most recently from 794efdd to 76d2479 Compare December 24, 2025 02:47
Copy link
Copy Markdown
Contributor

@grayhemp grayhemp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saeedma8 I have provided my initial review comments to address and a general question to thinks about (we can brainstorm it on a call together). Generally, I like the approach you picked.

Comment thread src/redshift-mcp-server/README.md Outdated
Comment thread src/redshift-mcp-server/README.md Outdated
Comment thread src/redshift-mcp-server/README.md Outdated
Comment thread src/redshift-mcp-server/README.md Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/redshift.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

This pull request is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon. If you wish to exclude this issue from being marked as stale, add the "backlog" label.

@github-actions github-actions Bot added the stale These are items that have been around for a long time without progress label Jan 17, 2026
@grayhemp
Copy link
Copy Markdown
Contributor

WIP

@github-actions github-actions Bot removed the stale These are items that have been around for a long time without progress label Jan 18, 2026
@saeedma8 saeedma8 force-pushed the main branch 2 times, most recently from 3aa8f6d to 45342ed Compare January 22, 2026 05:07
Copy link
Copy Markdown
Contributor

@grayhemp grayhemp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally a general request: I see some tests which don't relate to the subject of the PR are either changed or removed. Please make sure that the CR changes relevant functionality only and leaves unrelated tests intact.

Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/models.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py
Comment thread src/redshift-mcp-server/tests/test_redshift.py
Comment thread src/redshift-mcp-server/tests/test_redshift.py
Comment thread src/redshift-mcp-server/tests/test_redshift.py
Comment thread src/redshift-mcp-server/tests/test_redshift.py
Comment thread src/redshift-mcp-server/tests/test_redshift.py
@grayhemp
Copy link
Copy Markdown
Contributor

grayhemp commented Feb 6, 2026

@saeedma8 let's also:

  • Add an example of a query explain under the Advanced Examples section in README.md please
  • In the description of this CR under "Testing explain only", please post few more test examples showing various code paths.

Thanks.

@saeedma8 saeedma8 force-pushed the main branch 2 times, most recently from 10ad683 to 0bf3f81 Compare February 17, 2026 03:47
@saeedma8 saeedma8 changed the title feat(redshift-mcp-server): add get_execution_plan tool feat(redshift-mcp-server): add describe_execution_plan tool Feb 17, 2026
@saeedma8 saeedma8 requested a review from grayhemp February 17, 2026 03:51
@saeedma8 saeedma8 force-pushed the main branch 2 times, most recently from 91d22f1 to 24aa1eb Compare February 17, 2026 04:26
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/awslabs/redshift_mcp_server/consts.py Outdated
Comment thread src/redshift-mcp-server/tests/test_server.py Outdated
Comment thread src/redshift-mcp-server/tests/test_server.py
Comment thread src/redshift-mcp-server/tests/test_server.py
Comment thread src/redshift-mcp-server/tests/test_server.py
Comment thread src/redshift-mcp-server/tests/test_server.py
@grayhemp
Copy link
Copy Markdown
Contributor

Reviewed and commented on few changes in DM.

@saeedma8 saeedma8 force-pushed the main branch 7 times, most recently from 8c0693e to 4ba1082 Compare April 10, 2026 07:13
# Match native scans ("on <table>") and external/Spectrum scans where the
# table name follows directly or after "of" (PartitionInfo scans).
scan_pattern = regex.compile(
r'(?:S3 Seq Scan|S3 Query Scan|Seq Scan PartitionInfo of|Seq Scan|Index Scan|Bitmap Heap Scan) (?:on )?(\S+)',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index Scan|Bitmap Heap Scan? Does RS have those?

Comment on lines +189 to +197
stats_sequential_scans: Optional[int] = Field(
None, description='Number of sequential scans initiated'
)
stats_sequential_tuples_read: Optional[int] = Field(
None, description='Number of live rows fetched by sequential scans'
)
stats_rows_inserted: Optional[int] = Field(None, description='Number of rows inserted')
stats_rows_updated: Optional[int] = Field(None, description='Number of rows updated')
stats_rows_deleted: Optional[int] = Field(None, description='Number of rows deleted')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the stat columns we need to clarify "since the last statistics reset" in the description.

Comment on lines +673 to +675
f'Could not fetch table extra stats (may not be supported on serverless): {table_info_error}'
)
# Continue without table info - tables already have None values
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that we are "swallowing" a possible error here. Please double-check (provide evidence) if the query is runnable on provisioned/serverless with superuser/non-superuser user. If it is true for all 4 options then we need it to be an error and raise.

- table_type: The type of the table (TABLE, VIEW, EXTERNAL TABLE, SHARED TABLE).
- remarks: Remarks about the table.

Redshift-specific properties (from pg_catalog, may be None on serverless):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to be true. Dist style and estimated rows should be available normally on serverless. Please double check and provide evidence.

4. 'SHARED TABLE' types indicate tables from datashares.
5. Use table names for subsequent column discovery and query operations.
6. Consider table permissions (table_acl) for access planning.
7. Monitor table statistics and run ANALYZE when needed for accurate query planning.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"run ANALYZE" should be "suggest the user using ANALYZE" - you can not run ANALYZE directly throuh the MCP because of RO mode.

1. First use list_clusters to get valid cluster identifiers.
2. Then use list_databases to get valid database names for the cluster.
3. Do NOT include EXPLAIN keyword in your SQL - the tool adds it automatically.
4. Run ANALYZE on tables before checking execution plans for accurate cost estimates.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, "suggest the user running ANALYZE", see the comment above.

3. Do NOT include EXPLAIN keyword in your SQL - the tool adds it automatically.
4. Run ANALYZE on tables before checking execution plans for accurate cost estimates.
5. Compare plans before and after schema changes to measure optimization impact.
6. Use this tool to validate query performance before running expensive queries.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ask the user if they want to ..."

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that is a complete overkill. We just need to import BaseSchema, create a DumySchema(BaseSchema) class, and tests that it eliminates None fileds when dumped and dumped as JSON.

Comment thread src/redshift-mcp-server/README.md Outdated
- Access permissions
- Remarks and metadata
- Access permissions and remarks
- Redshift-specific metadata (from pg_catalog tables, may be None on serverless):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may be None on serverless" - as well here.

Comment thread src/redshift-mcp-server/README.md Outdated
- Human-readable plan text (summarized for large plans)
- Rule-based performance optimization suggestions derived from plan analysis, table design, and column statistics

**Permissions note**: Column planner statistics from `pg_stats` require SELECT privilege on the table. Table activity statistics from `pg_stat_user_tables` are visible to all users regardless of permissions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify on pg_stats "... for the statistics to be visible." soe it won't create an impression that it will raise permission error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Include get_execution_plan tool on the Redshift MCP server

2 participants