docs: graduate plugins out of experimental mode by johnnygreco · Pull Request #603 · NVIDIA-NeMo/DataDesigner

johnnygreco · 2026-05-04T18:50:01Z

Summary

Updates the docs around two related areas: plugin authoring now that the extension points are no longer experimental, and the code reference section so APIs are grouped by package/layer and have enough context to be useful from the docs site.

Changes

Added

docs/plugins/build_your_own.md as the consolidated guide for building column generator, seed reader, and processor plugins.
docs/plugins/models.md for model-backed plugin patterns and model registry usage.
Package/layer-oriented code reference pages under docs/code_reference/config/, docs/code_reference/engine/, and docs/code_reference/interface/, with overview pages for each group.
__init__.py files in engine resources and processing subpackages so mkdocstrings/griffe can discover seed reader and processor classes.

Changed

Reworked the Plugins nav and overview around Overview, Build Your Own, Using Models, and Available Plugins.
Embedded the NVIDIA-maintained plugin catalog table from the DataDesignerPlugins repo in docs/plugins/available.md.
Reorganized the Code Reference nav in mkdocs.yml by Config, Engine, and Interface, with updated cross-links from concepts and recipes.
Expanded and corrected docstrings for plugin extension points, config objects, generators, seed readers, processors, interface classes, and analysis/config references so generated docs render with useful field and method descriptions.
Improved code reference table styling so wide generated tables remain readable on narrower viewports.

Removed

Replaced the older plugin example pages (example.md, filesystem_seed_reader.md, processor.md) with the consolidated Build Your Own guide and targeted reference pages.
Replaced the older flat code reference pages with package-grouped code reference pages.

Attention Areas

Reviewers: Please pay special attention to the following:

mkdocs.yml - navigation moved from flat reference pages to package groups, and remote markdown snippets are now enabled for the plugin catalog table.
docs/plugins/available.md - the NVIDIA plugin table is pulled from the raw DataDesignerPlugins catalog during docs builds.
docs/code_reference/ - page paths and anchors changed as part of the Config / Engine / Interface split.
packages/data-designer-*/src/data_designer/ docstrings and discovery shims - these are mostly documentation-facing changes, but they affect what mkdocstrings exposes in the generated docs.

Validation

uv run --group docs mkdocs build

Description updated with AI

github-actions · 2026-05-04T18:51:23Z

Docs preview: https://2674aef1.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

github-actions · 2026-05-04T18:52:14Z

Review: PR #603 — `docs: graduate plugins out of experimental mode`

Summary

This PR reorganizes plugin documentation now that the three plugin extension points (column generator, seed reader, processor) are stable. It removes the "Experimental Feature" banners, replaces the single plugins/example.md walkthrough with per-type implementation guides (implement.md, processor.md, expanded filesystem_seed_reader.md), and adds proper API reference pages (code_reference/plugins.md, code_reference/generators.md). The plugins/available.md page becomes a real "Catalog" that points to the NVIDIA-maintained DataDesignerPlugins repo.

Code changes are minimal and non-functional:

Docstrings added to Plugin, PluginType, SingleColumnConfig.allow_resize, ProcessorConfig.processor_type, SeedSource, FileSystemSeedSource, ColumnGeneratorCellByCell, and ColumnGeneratorFullColumn.
A typo fix in Plugin.config_qualified_name's description ("name o the" → "name of the").
Three empty __init__.py files under data_designer.engine so griffe/mkdocstrings can resolve SeedReader, FileSystemSeedReader, and Processor for the new reference pages.
mkdocs.yml nav reshuffle + alphabetizing of Code Reference pages.

Scope matches what the PR description claims.

Findings

Correctness

__init__.py additions respect the namespace-package invariant. AGENTS.md pins the PEP 420 rule at the top-level data_designer namespace. The three new files live under data_designer.engine.resources, data_designer.engine.processing, and data_designer.engine.processing.processors, all inside a single distribution package — this is the normal way to expose subpackages for griffe and does not break the cross-distribution namespace merge. No concern.
Nav + anchors cross-link correctly. Spot-checked docs/code_reference/plugins.md {#data_designer.plugins.plugin.Plugin} anchor against the references in docs/plugins/overview.md, docs/plugins/implement.md, docs/code_reference/generators.md, and the column_configs.md addition for SingleColumnConfig. All the page-relative links (../plugins/implement.md, ../code_reference/generators.md, etc.) match the new filenames in this PR.
Plugin description typo fix matches the docstring fix. Both say "name of the …" now. Good.
Env-var documentation. overview.md states DISABLE_DATA_DESIGNER_PLUGINS=true disables entry point discovery. Verify the name matches the actual variable in the discovery code — docs of this kind rot silently if the flag is renamed.

Example code quality

Processor example uses astype(str).apply(lambda …) in both implement.md and processor.md. Idiomatic pandas would be data[self.config.column].astype(str).str.contains(self.config.pattern, regex=True) (optionally pre-compiled is unnecessary when using the Series accessor). As a "minimum working example" it's fine; a short note that vectorized .str.contains is preferable for real workloads would help new plugin authors.
get_column_emoji() returns "x" in implement.md where the old example.md used "✖️". Intentional simplification is fine, but x looks like a placeholder — consider a real emoji so readers don't copy a bare letter into their log output.
Import style is consistent across the three tab examples: from __future__ import annotations + TYPE_CHECKING for pandas when only used in annotations. Good — this mirrors the style guide's fast-import guidance.
Multiple-plugins-per-package section dropped the tests_e2e reference. The removed example.md pointed at tests_e2e/ as a concrete example of this pattern; implement.md's "Multiple plugins in one package" section just shows a TOML snippet. If that e2e directory is still a working example, add the link back — it's a cheap pointer that saves plugin authors from guessing.

Documentation accuracy

assert_valid_plugin coverage. implement.md says "Data Designer provides a testing utility for common plugin structure checks" and shows a single example. The deleted example.md explicitly listed what it validates ("config is subclass of ConfigBase", etc.). The new, terser wording is fine for most readers, but the deleted enumeration was genuinely useful for the "what will this catch?" question. Worth preserving one sentence about it.
Discovery troubleshooting bullets in overview.md are good and concrete (discriminator must be a string, regex-filter → REGEX_FILTER, etc.). This replaces an entire "Experimental" framing with something actionable — nice improvement.
Processor callback table in processor.md accurately lines up with the three process_* methods. The async-engine caveat note about row-count-changing pre/post-batch processors under DATA_DESIGNER_ASYNC_ENGINE=1 is a useful landmine callout.

Style / conventions

Docstring additions follow the existing Attributes-block format used elsewhere in base.py and plugin.py. No drift.
New __init__.py files use the right SPDX header (2026, Apache-2.0). Consistent with other engine packages.
mkdocs.yml alphabetization applies to Code Reference only, not to the top-level nav, which matches the comment # Keep code reference pages ordered alphabetically by nav label.. Confirm whether "Plugins" in the top-level nav should similarly list Catalog before Build Your Own (it currently goes Overview → Build Your Own → Catalog, which reads as a natural user journey and is probably preferable to alphabetical).

Risks

Low. This is almost entirely docs plus docstrings. The only runtime-observable change is the three new __init__.py files; because they are beneath a single installable package and do not introduce a data_designer/__init__.py, they cannot break the cross-package namespace merge.
One residual risk: if any tooling elsewhere in the repo relied on those three directories being implicit namespace packages (unlikely but worth a grep for pkgutil/find_namespace_packages usage around data_designer.engine.processing), it should still work — explicit subpackages are a strict superset of namespace behavior inside a single distribution.

Suggestions (non-blocking)

Confirm DISABLE_DATA_DESIGNER_PLUGINS matches the env var the discovery code actually reads.
Replace the "x" emoji placeholder in implement.md's column generator example with a real emoji.
Prefer Series.str.contains(...) over astype(str).apply(lambda …) in the processor examples, or add a one-line note that the .apply form is for illustration.
Restore the tests_e2e/ pointer in the "Multiple plugins in one package" section.
Optional: keep one sentence listing what assert_valid_plugin actually checks, since the deleted page had that and it's useful signal.

Verdict

Looks good to merge. The restructure is a clear improvement over the old single-example layout, the docstring additions are well-scoped and match existing style, and the __init__.py additions are the correct fix for mkdocstrings discovery without breaking the namespace-package invariant. The suggestions above are small polish items and none of them should block this PR.

greptile-apps · 2026-05-04T18:52:18Z

Greptile Summary

This PR graduates the Data Designer plugin system out of experimental status by restructuring the plugin docs (replacing the old example and filesystem-seed-reader guides with consolidated "Build Your Own" and "Using Models" pages), reorganizing the Code Reference nav into Config / Engine / Interface package groups, and expanding docstrings throughout so generated API docs are useful. It also adds __init__.py files in engine subpackages for mkdocstrings discovery and adds CSS for wide generated-API tables.

Nav restructure: Plugins nav now has Overview, Build Your Own, Using Models, and Available Plugins; Code Reference nav is split into Config, Engine, and Interface subgroups with overview pages and updated cross-links from every concepts/recipes page.
Source changes: New __init__.py files in engine/processing, engine/processing/processors, and engine/resources enable mkdocstrings/griffe to discover those subpackages; docstrings expanded on base generator, seed reader, and plugin classes; config.strategy → config.generation_strategy typo fixed in custom.py; processors-outputs/ → processors-files/ corrected in the processors concept page to match the actual artifact folder name.
url_download: true added to pymdownx.snippets to enable future remote markdown snippet inclusion (not yet used by any current page).

Confidence Score: 5/5

Safe to merge — all changes are documentation, docstrings, CSS, and mkdocstrings discovery shims with no runtime logic modifications.

The diff touches Python source files only to add docstrings and empty init.py files; the one runtime-visible change is a typo fix in a CustomColumnGenerator docstring and a documentation correction for the actual processor output folder name. No execution paths, data models, or API contracts are altered.

No files require special attention. The mkdocs.yml nav restructure is internally consistent and all cross-links from concepts/recipes pages have been updated to the new paths.

Important Files Changed

Filename	Overview
mkdocs.yml	Nav restructured from flat reference pages to Config/Engine/Interface groups; url_download: true added to pymdownx.snippets for future remote snippet support; all entries correctly reference new file paths.
docs/plugins/build_your_own.md	New consolidated plugin authoring guide with working examples for all three plugin types (column generator, seed reader, processor); replaces example.md and filesystem_seed_reader.md.
docs/plugins/models.md	New guide for model-backed plugin patterns using ColumnGeneratorWithModel and ColumnGeneratorWithModelRegistry; examples and registry access patterns are accurate to the codebase.
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py	Docstrings added to ColumnGeneratorCellByCell and ColumnGeneratorFullColumn; abstract generate() methods now have docstring bodies instead of ellipsis — valid Python, subclasses still must override.
packages/data-designer-config/src/data_designer/plugins/plugin.py	Class-level docstrings added to PluginType and Plugin; typo fix in config_qualified_name field description.
docs/concepts/processors.md	Corrected output directory reference from processors-outputs/ to processors-files/ matching the actual PROCESSORS_OUTPUTS_FOLDER_NAME constant; plugin authoring prose trimmed with link to new Build Your Own page.
docs/css/mkdocstrings.css	New CSS rules make wide auto-generated API tables horizontally scrollable with fixed column widths and wrappable code annotations.
packages/data-designer-engine/src/data_designer/engine/processing/init.py	New empty init.py (license header only) added so mkdocstrings/griffe can discover the processing subpackage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Plugins Nav
        PO[Overview]
        PB[Build Your Own]
        PM[Using Models]
        PA[Available Plugins]
    end

    subgraph Code Reference Nav
        CR[Overview]
        subgraph Config
            CC[analysis / column_configs / config_builder / data_designer_config / mcp / models / plugins / processors / run_config / sampler_params / seeds / validator_params]
        end
        subgraph Engine
            CE[column_generators / mcp / processors / seed_readers]
        end
        subgraph Interface
            CI[data_designer / errors / results]
        end
    end

    subgraph Removed
        EX[example.md]
        FS[filesystem_seed_reader.md]
        FLAT[Flat reference pages]
    end

    EX -->|replaced by| PB
    FS -->|replaced by| PB
    FLAT -->|replaced by| Config
    FLAT -->|replaced by| Engine
    FLAT -->|replaced by| Interface

    subgraph Discovery Shims
        PI1[engine/processing/__init__.py]
        PI2[engine/processing/processors/__init__.py]
        PI3[engine/resources/__init__.py]
    end

    PI1 & PI2 & PI3 -->|enables griffe to find| CE

_{Reviews (9): Last reviewed commit: "docs: update available plugins page" | Re-trigger Greptile}

andreatgretel · 2026-05-04T20:50:09Z

docs/code_reference/plugins.md:5 plus the three new __init__.py files (engine/resources/, engine/processing/, engine/processing/processors/)

The PR description says these __init__.py files exist "so griffe (mkdocstrings) can discover SeedReader, FileSystemSeedReader, and Processor for the new code reference." But I couldn't find any ::: directive in docs/ that targets data_designer.engine.resources.* or data_designer.engine.processing.* — those classes only appear inside code-block from … import … statements, which mkdocstrings doesn't process. Codex flagged the same thing.

As shipped, only column generators get an actual mkdocstrings-rendered API reference (via code_reference/generators.md). Processor and seed-reader authors still have prose examples but no auto-rendered base-class reference like the PR description promises.

Either land code_reference/seed_readers.md and code_reference/processors.md (engine-side) here to mirror the generators.md pattern, or drop the __init__.py files and the docstring-only churn on Processor/SeedReader/FileSystemSeedReader/SeedSource/FileSystemSeedSource/ProcessorConfig until a follow-up PR delivers the rendered pages. wdyt?

- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.

johnnygreco · 2026-05-05T16:28:45Z

@andreatgretel Re: #603 (comment)

This is addressed in the current PR state. The branch now has docs/code_reference/engine/seed_readers.md and docs/code_reference/engine/processors.md, both included under the Engine code-reference nav, and both pages contain mkdocstrings ::: directives for the relevant engine-side classes. I kept the __init__.py files and docstring work because those rendered reference pages now exist.

- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.

github-actions · 2026-05-05T17:16:40Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

johnnygreco · 2026-05-05T17:17:14Z

@nabinchha could you take a close look at the Using Models in Plugins section?

Link: https://github.com/NVIDIA-NeMo/DataDesigner/blob/johnny/docs/plugins-out-of-experimental-mode/docs/plugins/models.md

We want to establish a good pattern for using models in plugins, especially the recommended base class split for single-model vs multi-model generators and the alias validation / health-check behavior.

nabinchha · 2026-05-05T20:41:32Z

@johnnygreco — took a close pass at docs/plugins/models.md. Cross-checked it against the engine code; the base-class split and the discovery → health-check chain are described correctly. A few things worth tightening, plus one engine-side limitation that came out of the review.

Things I'd change in this PR

Tighten the wording around model_alias being required. The doc currently says "the config should keep a primary model_alias field because startup health checks collect that field…". Because _run_model_health_check_if_needed does model_aliases.add(config.model_alias) unconditionally for any column type whose impl inherits ColumnGeneratorWithModelRegistry, it's effectively required, not advisory — without it, plugin users get an AttributeError from inside the health-check loop before the friendlier registry "alias not found" error ever runs. Suggest something like "The config must include a model_alias: str field — startup health checks read it directly off any column config whose generator inherits from ColumnGeneratorWithModelRegistry (including via ColumnGeneratorWithModel)."
Show the PairwiseJudgeColumnConfig alongside the multi-model generator example. The single-model example shows both halves; the multi-model one only imports PairwiseJudgeColumnConfig from data_designer_pairwise_judge.config, which makes it harder for readers to see that the config defines both model_alias and judge_model_alias. A small config snippet (or an inline comment) closes the loop with point 1 and makes it visually obvious which alias gets the standard health check vs which one only gets the _validate() resolution.
Sharpen the alias-validation note. "Validate additional alias fields in _validate()… so missing aliases fail before generation starts" is true, but readers may infer a model health check happens. Something like "get_model_config(alias) only verifies the alias is registered; it does not call the endpoint. Endpoint reachability is only exercised for the primary model_alias collected by the standard startup health check."
Tiny copy nit: "The engine already builds a ResourceProvider for each generator" reads as one-per-generator; in practice it's one ResourceProvider per builder shared with each generator. Easy fix: "…builds a ResourceProvider and exposes its model registry to every generator at:".

Engine-side limitation surfaced by the review

The reason point 3 needs to exist at all is that secondary aliases on a packaged plugin config can't be opted into the standard startup health check today — only CustomColumnConfig.model_aliases (plural) is rolled in via an isinstance branch in the builder. For a packaged plugin with model_alias + judge_model_alias, only the primary alias gets the endpoint ping; the secondary alias's reachability and credentials only surface at first generation call.

I filed #606 to propose a small fix: a get_model_aliases() accessor on SingleColumnConfig that defaults to [self.model_alias] (preserving current behavior) and that plugin configs override to declare every alias they depend on. The builder's isinstance(config, CustomColumnConfig) branch collapses into the same path, and the docs in this PR can switch from "validate manually in _validate()" to "override get_model_aliases()". Happy to do that as a follow-up to #603 once the docs land, or fold it in if you'd rather ship them together.

Verdict on the docs alone: the substance is right, the four items above are polish, and the multi-model alias story will be much cleaner once #606 lands.

andreatgretel · 2026-05-06T12:45:48Z

Bundling the code reference reorg with plugins-graduation makes sense given plugin authors now need to navigate engine.column_generators, engine.processing.processors, etc. as a real public surface. The thing I'd still flag: the reorg landed across ~10 commits and grew the diff from 19 files to 72, which makes reviewing harder and links the two for reverts. Not blocking, just wanted to surface it. A sentence in the PR description tying the two together would help future readers.

johnnygreco · 2026-05-06T15:59:17Z

@nabinchha thanks for the close read. I pushed 4c3365ee to address the PR-local docs items:

made model_alias: str required wording explicit for ColumnGeneratorWithModelRegistry / ColumnGeneratorWithModel configs
added the missing PairwiseJudgeColumnConfig snippet so both model_alias and judge_model_alias are visible next to the multi-model generator example
clarified that get_model_config(alias) only validates registration, and that endpoint reachability is only covered by the standard startup health check for the primary model_alias
fixed the ResourceProvider wording to describe the shared provider exposed to generators

I left the #606 engine/API improvement as follow-up rather than documenting get_model_aliases() in this PR before it exists.

andreatgretel

Three follow-ups from a final consistency pass — one is a real failure on the default engine. Posting as a single review since GitHub's per-comment endpoint is currently 422-ing.

`docs/plugins/build_your_own.md:240` (processor example)

Codex caught this: this example raises DatasetProcessingError on the default engine. dataset_builder.py:72 sets DATA_DESIGNER_ASYNC_ENGINE default to "1", the async path passes strict_row_count=True into the runner (dataset_builder.py:462,471), and processor_runner.py:86,107 raises on any row-count delta. Filter via data[mask].reset_index(...) returns a shorter frame, so the very first batch fails. The comment at dataset_builder.py:69-71 also says the sync engine is scheduled for removal after one transitional release, so DATA_DESIGNER_ASYNC_ENGINE=0 isn't a durable fix.

Two clean options: (a) move the filter to process_after_generation() so the example is row-count-stable everywhere, or (b) keep process_before_batch and add an inline !!! warning linking to concepts/processors.md#row-count-changes so readers don't copy-paste a broken example. Leaning toward (a) since "filter rows by regex" is the obvious first plugin people will write. wdyt?

`docs/code_reference/interface/errors.md:3` (intro paragraph)

The intro says this page covers errors at the public API boundary, but DataDesignerEarlyShutdownError isn't documented here. It's exported from data_designer.interface (interface/__init__.py:13-22), defined with a real docstring in interface/errors.py (subclass of DataDesignerGenerationError, raised on early shutdown with no records), and concepts/architecture-and-performance.md:290-300 already has a try/except DataDesignerEarlyShutdownError: example telling users to catch it for retry-with-different-alias flows. Worth a fourth ::: section to match the other three.

`docs/code_reference/engine/seed_readers.md` (stub-only symbols)

nit: 5 of the 7 symbols surfaced on this page have no docstring, so they'll render as near-empty stubs in the generated reference: SeedReaderFileSystemContext, SeedReaderBatch, SeedReaderBatchReader, PandasSeedReaderBatch, and create_seed_reader_output_dataframe (engine/resources/seed_reader.py:53,58,62,67,74). SeedReader and FileSystemSeedReader both have real prose and are the load-bearing entries. Either add one-line docstrings for the five protocol/dataclass/helper symbols, or drop them from the page and let mkdocstrings only render what has documentation. wdyt?

Also a pre-existing prose drift outside the diff hunks worth fixing while you're in this area: docs/concepts/processors.md:91 says outputs land in processors-outputs/{name}/, but the actual on-disk folder is processors-files/ — engine/storage/artifact_storage.py:31 defines PROCESSORS_OUTPUTS_FOLDER_NAME = "processors-files", and data-designer-config/.../processors.py:69 agrees. Only the prose disagrees with what gets written.

johnnygreco · 2026-05-06T18:50:19Z

@andreatgretel thanks for the final pass. I pushed 2403fc4a to address these:

changed the regex-filter processor example to use process_after_generation() so row filtering works on the default async engine
added DataDesignerEarlyShutdownError to the interface errors reference page
added one-line docstrings for the seed-reader context, batch protocols, pandas batch wrapper, and output DataFrame helper rendered on the seed readers reference page
fixed the processor output path prose from processors-outputs/{name}/ to processors-files/{name}/

Validation: uv run --group docs mkdocs build passes with the existing docs warnings.

Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc.

Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations.

Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md.

Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference.

Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable.

- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.

johnnygreco requested a review from a team as a code owner May 4, 2026 18:50

johnnygreco temporarily deployed to agentic-ci May 4, 2026 18:50 — with GitHub Actions Inactive

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Comment thread mkdocs.yml Outdated

andreatgretel reviewed May 4, 2026

View reviewed changes

Comment thread docs/plugins/overview.md Outdated

andreatgretel reviewed May 4, 2026

View reviewed changes

Comment thread docs/plugins/build_your_own.md

johnnygreco force-pushed the johnny/docs/plugins-out-of-experimental-mode branch from 08d153d to 1e93465 Compare May 5, 2026 16:30

johnnygreco force-pushed the johnny/docs/plugins-out-of-experimental-mode branch from b84bae9 to a88c702 Compare May 5, 2026 17:20

andreatgretel mentioned this pull request May 5, 2026

Agentic CI: Issue & PR Triage Tracker #562

Open

nabinchha mentioned this pull request May 5, 2026

feat(engine): let plugin column configs declare all model aliases for the startup health check #606

Open

2 tasks

andreatgretel reviewed May 6, 2026

View reviewed changes

Comment thread docs/plugins/build_your_own.md

andreatgretel reviewed May 6, 2026

View reviewed changes

Comment thread docs/plugins/available.md Outdated

andreatgretel reviewed May 6, 2026

View reviewed changes

andreatgretel previously approved these changes May 6, 2026

View reviewed changes

johnnygreco dismissed andreatgretel’s stale review via c49880f May 6, 2026 20:31

johnnygreco added 3 commits May 6, 2026 20:33

johnnygreco added 17 commits May 6, 2026 20:33

docs: split code reference by package

5e7f15b

docs: add interface code reference

f894af0

docs: add code reference overviews

62d5eff

docs: refine code reference pages

a9a3098

docs: improve code reference tables

344f7e0

docs: correct reference docstrings

9879f85

docs: embed plugin catalog table

7ea13bf

docs: note plugin discovery restart caveat

c0cc59f

docs: explain generator base class choice

52e2e6a

docs: mention async cell generator examples

dab3a77

docs: clarify plugin model usage

b957570

docs: clarify plugin model aliases

c8a9c07

docs: address plugin review feedback

fd46b10

docs: update available plugins page

aae652e

johnnygreco force-pushed the johnny/docs/plugins-out-of-experimental-mode branch from c49880f to aae652e Compare May 6, 2026 20:33

andreatgretel approved these changes May 6, 2026

View reviewed changes

johnnygreco merged commit 8b8d748 into main May 6, 2026
50 checks passed

Conversation

johnnygreco commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Added

Changed

Removed

Attention Areas

Validation

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Review: PR #603 — docs: graduate plugins out of experimental mode

Summary

Findings

Correctness

Example code quality

Documentation accuracy

Style / conventions

Risks

Suggestions (non-blocking)

Verdict

Uh oh!

greptile-apps Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreatgretel commented May 4, 2026

Uh oh!

johnnygreco commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnygreco commented May 5, 2026

Uh oh!

nabinchha commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

andreatgretel commented May 6, 2026

Uh oh!

johnnygreco commented May 6, 2026

Uh oh!

andreatgretel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

docs/plugins/build_your_own.md:240 (processor example)

docs/code_reference/interface/errors.md:3 (intro paragraph)

docs/code_reference/engine/seed_readers.md (stub-only symbols)

Uh oh!

johnnygreco commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johnnygreco commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

Review: PR #603 — `docs: graduate plugins out of experimental mode`

greptile-apps Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 5, 2026 •

edited

Loading

andreatgretel left a comment •

edited

Loading

`docs/plugins/build_your_own.md:240` (processor example)

`docs/code_reference/interface/errors.md:3` (intro paragraph)

`docs/code_reference/engine/seed_readers.md` (stub-only symbols)