Content Structure

Parsed markdown is stored in SQLite tables that can be used for rendering and database content management.

Deepwiki

Install

prerequisites

choco
node-gyp see https://github.com/nodejs/node-gyp?tab=readme-ov-file#on-windows

Usage

npm install content-structure

this project is a library, for full config, see the example in example/parse.js

collect all data by running this once

import {collect} from 'content-structure'

await collect({
    rootdir:rootdir,
    contentdir:join(rootdir,"content"),
    file_link_ext:["svg","webp","png","jpeg","jpg","xlsx","glb"],
    outdir:join(rootdir,".structure")
})

see demo with

>pnpm run demo
> node parse.js

content_dir : C:\dev\MicroWebStacks\content-structure\example\content
   searching for files with extensions : *.md
Structure DB tables and row counts:
  - asset_info: 19
  - assets: 19
  - blob_store: 14
  - documents: 30
  - items: 82

Documentation

Content Structure produces a relational snapshot of every markdown run using the schema declared in catalog.yaml.
The catalog defines a single structure dataset whose tables are optimized for rendering, search indexing, and asset management. Each run populates these tables under .structure/structure.db.

Table overview

Table	Purpose	Relationships
`documents`	Canonical row per markdown entry. Stores stable ids, routing metadata, and leftover front matter via the `meta_data` JSON column.	`items`, `assets`, and `asset_info` reference `documents.sid`.
`items`	Flattened AST stream in reading order. Each row keeps `body_text` for simple rendering plus an optional serialized AST subtree for nested constructs (stored in `ast`).	References `documents` via `doc_sid`; `assets` rows connect items to blobs when an AST node produces a file.
`assets`	Run-specific join table so consumers can tell which document referenced which asset at a given `version_id`.	Bridge between `documents` and `asset_info`; also carries the `blob_uid` for quick payload lookups.
`asset_info`	Deduplicated description of every asset (code blocks, tables, linked files, etc.) regardless of run.	Points to the owning document (`parent_doc_uid`) and the physical payload via `blob_uid`.
`blob_store`	Source of truth for payloads. Large blobs are stored under `blobs/YYYY/MM/ff/hash` and referenced by path, while small blobs inline their bytes (compressed when eligible).	`asset_info`/`assets` link to blobs through `blob_uid`.

The catalog is intentionally compact: fields are named to match DOM concerns (slug, url_type, level), content analysis (headings, links, code), and asset lifecycle (first_seen, last_seen). Instead of memorizing every column, browse catalog.yaml whenever you need the exact types or to extend the dataset. Downstream tools can rely on the catalog as the authoritative contract when generating queries, migrations, or analytics dashboards.

Document behavior highlights

Metadata folding – Any front matter not mapped to a declared column is serialized into documents.meta_data, keeping schemas manageable without losing context.
Automatic ordering – Documents inherit incremental order values scoped to their directory level unless you pin them explicitly. This keeps navigation menus stable even when markdown files are added later.
Mixed routing – Folder-style (readme.md or matching filenames) and file-style URLs coexist. url_type reveals which variant was used to generate the url.

Item and asset lifecycle

Paragraphs, headings, tables, code blocks, and images are all represented in items. Simple rows expose fully extracted text; nested structures store their sanitized AST so you can re-render bold or embedded assets without reparsing the original markdown.
Every asset mentioned by an item produces two entries: a durable definition in asset_info and a run-scoped membership row in assets. The membership row ties the asset to both the document and its blob so you can know exactly when something was added, removed, or reused.
Blob payloads avoid bloat with configurable thresholds: large files stream to disk under blobs/, while smaller text blobs can be compressed inline and served straight from SQLite.

Refer back to the catalog for exhaustive field notes, and treat the tables above as the primary contract between your markdown source and any rendering or analytics layers.

Config parameters

the config parameter is optional and do have default values

rootdir : defaults to current working directory. The path where to find the a content directory.
outdir : defaults to .structure. Relative output directory is the location where all output data will be generated, which is relative to the root directory.
folder_single_doc : defaults to false. When true, each folder is treated as a single document and the first YAML/YML file contributes overrides plus meta_data fields.
external_storage_kb : defaults to 512. Blobs larger than this size (in KB) are written to disk under blobs/<YYYY>/<MM>/<prefix>/<hash>.
inline_compression_kb : defaults to 32. Inline blobs bigger than or equal to this size are eligible for gzip compression before being stored inside the blob_store table.
file_compress_ext : defaults to ["txt","md","json","csv","tsv","yaml","yml"]. Inline blobs are compressed only if their source extension (when known) appears in this list.

Generated output

.structure/structure.db : a SQLite database (powered by better-sqlite3). The database exposes the tables documents, items, assets, asset_info, and blob_store.
blobs/year/month/prefix/hash path for all files larger than config.external_storage_kb

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
example		example
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
catalog.yaml		catalog.yaml
design.drawio.svg		design.drawio.svg
index.js		index.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Content Structure

Deepwiki

Install

Usage

Documentation

Table overview

Document behavior highlights

Item and asset lifecycle

Config parameters

Generated output

About

Uh oh!

Sponsor this project

Uh oh!

Uh oh!

Languages

Uh oh!

License

MicroWebStacks/content-structure

Folders and files

Latest commit

History

Repository files navigation

Content Structure

Deepwiki

Install

Usage

Documentation

Table overview

Document behavior highlights

Item and asset lifecycle

Config parameters

Generated output

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Languages