smg is a SiteMap Generator for webapps.
The goal of the project is to generate custom sitemaps for any web app, regardless of framework or meta-framework, with a single cli command and within build processes (since this can automated with bash scripts and the config file).
- Next.js (lacking perfect implementation of some features)
This project is evolving from a scrappy project @cadecrow made for himself to get familiar with Rust. The original purpose was to generate sitemaps within a Next.js app router project. The docs and related code from the "initial commit" are evolving from this use case.
- convert from implementation only meant for next.js projects to a cli tool with a setup wizard. The setup wizard will allow users to config smg based on their framework, router, etc...
A Next.js Sitemap Generator
A command-line tool written in Rust that scans your Next.js project to automatically generate both XML and JSON sitemaps. The tool analyzes your project's file structure, identifies all routes based on the App Router conventions, and creates sitemap files that can be used for SEO and navigation.
Stable Features:
- Automatically detects routes based on Next.js App Router conventions
- Generates standard sitemap.xml for search engines
- Creates a detailed sitemap.json with labels and descriptions for building navigation components
- Handles dynamic routes (e.g.,
[id],[...slug]) - Removes directories wrapped in parentheses from routes:
app/(root)/path/page.tsx=>/path - Preserves file modification times for
lastmodentries - Preserves custom labels and descriptions for unchanged paths between successive executions of sitemap generation
Testing or Under Development:
- Custom output files and path matching rules. See Advanced Usage
-
- see Scratch Notes for some very informal notes on the current state of development.
Contributions are welcome! Please feel free to submit a Pull Request. Please review Contributions & Requests for information on contributing, help to understand the project and its structure, and to see requests for help.
- Rust 1.70.0 or higher
# Clone the repository
git clone https://github.com/cadecrow/smg.git
cd smg
# Build the project
cargo build --release
from within smg directory
cargo install --path .
If you do not use cargo, you can add the Release Directory to your PATH or copy the Binary to a PATH directory. I will not explain this here. If you do not know what this means but you want to use one of these options, you'd be best served doing some googling or perplexitying to learn what this means.
smg [OPTIONS]
| Argument | Short | Description | Default |
|---|---|---|---|
| --project | -p | Path to your Next.js project | Current directory (.) |
| --xml-output | Output path for sitemap.xml | sitemap.xml | |
| --json-output | Output path for sitemap.json | sitemap.json | |
| --base-url | -b | Base URL for your website | https://example.com |
Generate sitemaps for a Next.js project in the current directory:
smg
Specify a different project directory and base URL:
smg --project ./my-nextjs-app --base-url https://mywebsite.com
Customize output file paths:
smg --xml-output ./public/sitemap.xml --json-output ./src/data/sitemap.json
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2023-04-29T10:42:19Z</lastmod>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2023-04-28T15:30:00Z</lastmod>
</url>
</urlset>[
{
"route": "/",
"path": "app/page.tsx",
"label": "Home",
"description": "",
"last_modified": "2023-04-29T10:42:19Z"
},
{
"route": "/about",
"path": "app/about/page.tsx",
"label": "About",
"description": "",
"last_modified": "2023-04-28T15:30:00Z"
}
]- When the program starts, it looks for an
smg.config.jsonfile - If the file exists, it loads the settings
- If the file doesn't exist, it creates one with default values
- Command line arguments take precedence over config file settings
- The merged configuration is used to run the program
This approach gives you flexibility:
- You can set project-specific defaults in the config file
- You can override specific settings via command line when needed
- The config file is automatically created with sensible defaults
The configuration file is particularly useful for CI/CD pipelines or when running the tool regularly, as users won't need to specify the same arguments each time.
- The tool follows Next.js App Router conventions, looking for page.tsx files
- Accounts for folders with names contained within parens. e.g.
app/(layout-group)/blog/page.tsx=>/blog - Dynamic routes (with
[brackets]) are detected and included in the sitemap - Routes in directories starting with underscore (_) or inside api directories are excluded
- You can edit the generated JSON to add custom descriptions and labels
You can create an smg.config.json file in your project root to set default options:
{
"project": ".",
"xml_output": "public/sitemap.xml",
"json_output": "src/data/sitemap.json",
"base_url": "https://mywebsite.com"
}If no configuration file exists, the tool will create one with default values when first run.
REMEMBER: Command line arguments will always override settings in the configuration file.
The smg.config.json file supports advanced path filtering and custom sitemap generation:
You can exclude specific routes from both the sitemap.xml and sitemap.json sitemaps:
{
"excluded_routes": {
"exact": ["/admin", "/login"],
"children": ["/drafts"],
"patterns": ["^/temp-.*$"]
}
}- exact: Matches exact routes
- children: Matches the specified route and all its children
- patterns: Matches routes using regular expressions
You can generate additional sitemap files for specific groups of routes. You can also specify whether or not these routes are added to the main sitemap.json and sitemap.xml files.
{
"custom_sitemaps": {
"blog": {
"output": "blog_sitemap.json",
"include_in_main_json": true,
"include_in_main_xml": false,
"routes": {
"exact": ["/blog"],
"children": ["/posts"],
"patterns": ["^/articles/.*$"]
}
},
"products": {
"output": "products_sitemap.json",
"include_in_main_json": false,
"include_in_main_xml": true,
"routes": {
"exact": [],
"children": ["/products"],
"patterns": []
}
}
}
}For each custom sitemap:
- output: Path where the custom sitemap JSON will be saved
- include_in_main_json: Whether to include matching routes in the main JSON sitemap
- include_in_main_xml: Whether to include matching routes in the main XML sitemap
- routes: Route patterns to include in this custom sitemap
When routes match multiple rules:
!!! Note, conflict resolution is currently buggy. I (or an awesome contributor such as maybe... yourself) will fix it and update this.
If a custom sitemap sets inclusion in the main sitemap to true, it will override excluded routes.
You can still use this just fine for creating custom sitemaps and excluding paths. The bug is just that a conflict in custom_routes will override excluded_routes
If a path is in excluded_routes, it won't appear in the main sitemaps, regardless of the rules in custom_sitemaps
If a path matches multiple custom sitemaps, it will appear in all matching custom sitemap files
A path will only appear in the main sitemaps if all its matching custom sitemaps have the respective includein_main_ flag set to true. In other words, if any custom sitemap has include*in_main*_ = false, then that path will not make it to the main sitemap file.
If a path is in excluded_routes, it won't appear in the main sitemaps, BUT rules in custom_sitemaps will override this.
If a path matches multiple custom sitemaps, it will appear in all matching custom sitemap files
Here's a complete example of what the configuration file might look like:
{
"project": ".",
"xml_output": "public/sitemap.xml",
"json_output": "public/sitemap.json",
"base_url": "https://example.com",
"excluded_routes": {
"exact": ["/admin", "/login", "/logout"],
"children": ["/internal"],
"patterns": ["^/temp-.*$", "^/draft-.*$"]
},
"custom_sitemaps": {
"blog": {
"output": "public/blog_sitemap.json",
"include_in_main_json": true,
"include_in_main_xml": true,
"routes": {
"exact": ["/blog"],
"children": ["/posts"],
"patterns": ["^/articles/.*$"]
}
},
"products": {
"output": "public/products_sitemap.json",
"include_in_main_json": false,
"include_in_main_xml": true,
"routes": {
"children": ["/products"]
}
},
"docs": {
"output": "public/docs_sitemap.json",
"include_in_main_json": false,
"include_in_main_xml": false,
"routes": {
"children": ["/docs", "/guides"]
}
}
}
}