This plugin brings AWS data engineering expertise directly into your coding assistant, covering the full data lifecycle across AWS Analytics services; currently, skills are provided to assist with the following capability areas:
- Data Lake Operations — Build and operate a data lake on AWS: create managed Iceberg tables on Amazon S3 Tables, ingest data from diverse sources (S3, JDBC databases, Snowflake, BigQuery, DynamoDB, AWS Glue catalog tables), and query across default and federated catalogs with Amazon Athena.
- Data Discovery — Inventory and audit your AWS Glue Data Catalog across S3 Tables, Amazon Redshift-federated, and remote Iceberg catalogs. Resolve data asset references by name, keyword, column, or reverse-lookup from S3 location metadata in the catalog.
- Vector Storage — Store and query vector embeddings using Amazon S3 Vectors for cost-effective semantic search and RAG workloads.
- External Connectivity — Create and troubleshoot AWS Glue connections to JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS, Aurora), Amazon Redshift, Snowflake, and BigQuery.
| # | Skill | Description | Documentation |
|---|---|---|---|
| 1 | creating-data-lake-table |
Create managed Iceberg tables using Amazon S3 Tables with automatic compaction, AWS Glue catalog registration, and partitioning | SKILL.md |
| 2 | ingesting-into-data-lake |
Import data from S3 files, JDBC databases, Snowflake, BigQuery, DynamoDB, or existing AWS Glue catalog tables into S3 Tables or standard Iceberg | SKILL.md |
| 3 | querying-data-lake |
Execute and manage Athena SQL queries across default and federated catalogs (AWS Glue, S3 Tables, Amazon Redshift) | SKILL.md |
| 4 | finding-data-lake-assets |
Resolve data lake asset references across AWS Glue Data Catalog, S3, S3 Tables, and Amazon Redshift by name, keyword, column, or S3 path | SKILL.md |
| 5 | exploring-data-catalog |
Full inventory and audit of AWS Glue Data Catalog assets across S3 Tables, Amazon Redshift-federated, and remote Iceberg catalogs | SKILL.md |
| 6 | storing-and-querying-vectors |
Store and query vector embeddings using Amazon S3 Vectors for semantic search and RAG workloads | SKILL.md |
| 7 | connecting-to-data-source |
Create and troubleshoot AWS Glue connections to JDBC databases, Amazon Redshift, Snowflake, and BigQuery | SKILL.md |
| # | Server | Description |
|---|---|---|
| 1 | aws-mcp |
AWS API access, documentation search, and SOP retrieval via AWS MCP Server |
See Quick Start.
The data lake skills cover the jobs-to-be-done for building and operating a data lake on AWS. They follow AWS best practices as agent-readable instruction packages, guiding you from table creation through ingestion and querying.
- Create tables — The
creating-data-lake-tableskill sets up managed Iceberg tables on Amazon S3 Tables with automatic compaction, snapshot management, AWS Glue catalog registration, partitioning, and IAM access control. - Ingest data — The
ingesting-into-data-lakeskill moves data from local files, S3, JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS, Aurora, Amazon Redshift), Snowflake, BigQuery, DynamoDB, or existing AWS Glue catalog tables into your data lake. Supports one-time loads, recurring pipelines, and migrations. - Query data — The
querying-data-lakeskill executes Athena SQL queries across default and federated catalogs, with workgroup selection, statement classification, cost tracking, and error recovery.
- "Create an Iceberg table for our order events with daily partitioning"
- "Import our PostgreSQL sales data into the data lake"
- "Query the top 10 customers by revenue from our analytics table"
- "Migrate our existing Hive tables to Iceberg on S3 Tables"
The discovery skills help you understand what data exists in your AWS account and find specific assets quickly.
exploring-data-catalog— Full inventory and audit across AWS Glue Data Catalog, S3 Tables, Amazon Redshift-federated, and remote Iceberg catalogs. Maps your data landscape, flags stale tables, and suggests improvements.finding-data-lake-assets— Resolves fuzzy data references ("our orders table", "the sales dataset") to concrete catalog entries using layered search across AWS Glue, S3, S3 Tables, and Amazon Redshift.
- "What data do we have in our account?"
- "Inventory all catalogs and databases"
- "Find the table that has customer_id"
- "Where is our quarterly revenue data?"
The storing-and-querying-vectors skill provides cost-effective vector embedding storage and retrieval using Amazon S3 Vectors, optimized for long-term storage with subsecond query latency.
- "Create a vector index for our product embeddings"
- "Store these document embeddings for RAG"
- "Find the most similar items to this query vector"
The connecting-to-data-source skill creates and troubleshoots AWS Glue connections to external databases. It discovers existing connections and candidate sources in your account, registers credentials securely via Secrets Manager or IAM DB auth, configures VPC networking, and tests end-to-end connectivity.
- "Connect to our Oracle production database"
- "Set up an AWS Glue connection to Snowflake"
- "Test my existing BigQuery connection"
- "Troubleshoot the connection timeout on my RDS connection"
In your local environment, configure AWS credentials and set your target region to get started.
- An AWS account with access to AWS Analytics services (AWS Glue, Athena, S3 Tables, S3 Vectors)
- Local AWS credentials and config
- uv (for MCP server)
Configure AWS credentials using one of the following methods:
- AWS CLI — Run
aws configure(IAM credentials) oraws sso login(IAM Identity Center) - Environment variables — Set
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_SESSION_TOKEN. See Configuring environment variables for details.
Your IAM role needs permissions for the AWS services used by the skills you install. The relevant IAM action namespaces are:
athena- Query execution and workgroup managementglue- Data Catalog operations and ETL jobss3- Object storage operationss3tables- Managed Iceberg table operations (separate froms3)s3vectors- Vector storage operations (separate froms3)
Scope permissions to the resources your workload uses.
- Set
AWS_DEFAULT_REGIONto your preferred AWS region (e.g.,us-east-1). See Configuring environment variables for details.
The skills in this plugin follow AWS best practices, but they are fully customizable. You can fork the repository and modify any SKILL.md to reflect your organization's standards, naming conventions, approved data formats, or internal tooling. Workspace-level skills take precedence over global skills, so teams can maintain their own versions without affecting other users.
- AWS Analytics Services
- Amazon S3 Tables
- Amazon S3 Vectors
- Amazon Athena User Guide
- AWS Glue Developer Guide
- Agent Skills open standard — Anthropic
- AWS Agent Toolkit for AWS
This project is licensed under the Apache 2.0 License.