Skip to content

docs: graduate plugins out of experimental mode#603

Merged
johnnygreco merged 20 commits intomainfrom
johnny/docs/plugins-out-of-experimental-mode
May 6, 2026
Merged

docs: graduate plugins out of experimental mode#603
johnnygreco merged 20 commits intomainfrom
johnny/docs/plugins-out-of-experimental-mode

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco commented May 4, 2026

Summary

Updates the docs around two related areas: plugin authoring now that the extension points are no longer experimental, and the code reference section so APIs are grouped by package/layer and have enough context to be useful from the docs site.

Changes

Added

  • docs/plugins/build_your_own.md as the consolidated guide for building column generator, seed reader, and processor plugins.
  • docs/plugins/models.md for model-backed plugin patterns and model registry usage.
  • Package/layer-oriented code reference pages under docs/code_reference/config/, docs/code_reference/engine/, and docs/code_reference/interface/, with overview pages for each group.
  • __init__.py files in engine resources and processing subpackages so mkdocstrings/griffe can discover seed reader and processor classes.

Changed

  • Reworked the Plugins nav and overview around Overview, Build Your Own, Using Models, and Available Plugins.
  • Embedded the NVIDIA-maintained plugin catalog table from the DataDesignerPlugins repo in docs/plugins/available.md.
  • Reorganized the Code Reference nav in mkdocs.yml by Config, Engine, and Interface, with updated cross-links from concepts and recipes.
  • Expanded and corrected docstrings for plugin extension points, config objects, generators, seed readers, processors, interface classes, and analysis/config references so generated docs render with useful field and method descriptions.
  • Improved code reference table styling so wide generated tables remain readable on narrower viewports.

Removed

  • Replaced the older plugin example pages (example.md, filesystem_seed_reader.md, processor.md) with the consolidated Build Your Own guide and targeted reference pages.
  • Replaced the older flat code reference pages with package-grouped code reference pages.

Attention Areas

Reviewers: Please pay special attention to the following:

  • mkdocs.yml - navigation moved from flat reference pages to package groups, and remote markdown snippets are now enabled for the plugin catalog table.
  • docs/plugins/available.md - the NVIDIA plugin table is pulled from the raw DataDesignerPlugins catalog during docs builds.
  • docs/code_reference/ - page paths and anchors changed as part of the Config / Engine / Interface split.
  • packages/data-designer-*/src/data_designer/ docstrings and discovery shims - these are mostly documentation-facing changes, but they affect what mkdocstrings exposes in the generated docs.

Validation

  • uv run --group docs mkdocs build

Description updated with AI

@johnnygreco johnnygreco requested a review from a team as a code owner May 4, 2026 18:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Docs preview: https://2674aef1.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Review: PR #603docs: graduate plugins out of experimental mode

Summary

This PR reorganizes plugin documentation now that the three plugin extension points (column generator, seed reader, processor) are stable. It removes the "Experimental Feature" banners, replaces the single plugins/example.md walkthrough with per-type implementation guides (implement.md, processor.md, expanded filesystem_seed_reader.md), and adds proper API reference pages (code_reference/plugins.md, code_reference/generators.md). The plugins/available.md page becomes a real "Catalog" that points to the NVIDIA-maintained DataDesignerPlugins repo.

Code changes are minimal and non-functional:

  • Docstrings added to Plugin, PluginType, SingleColumnConfig.allow_resize, ProcessorConfig.processor_type, SeedSource, FileSystemSeedSource, ColumnGeneratorCellByCell, and ColumnGeneratorFullColumn.
  • A typo fix in Plugin.config_qualified_name's description ("name o the""name of the").
  • Three empty __init__.py files under data_designer.engine so griffe/mkdocstrings can resolve SeedReader, FileSystemSeedReader, and Processor for the new reference pages.
  • mkdocs.yml nav reshuffle + alphabetizing of Code Reference pages.

Scope matches what the PR description claims.

Findings

Correctness

  • __init__.py additions respect the namespace-package invariant. AGENTS.md pins the PEP 420 rule at the top-level data_designer namespace. The three new files live under data_designer.engine.resources, data_designer.engine.processing, and data_designer.engine.processing.processors, all inside a single distribution package — this is the normal way to expose subpackages for griffe and does not break the cross-distribution namespace merge. No concern.
  • Nav + anchors cross-link correctly. Spot-checked docs/code_reference/plugins.md {#data_designer.plugins.plugin.Plugin} anchor against the references in docs/plugins/overview.md, docs/plugins/implement.md, docs/code_reference/generators.md, and the column_configs.md addition for SingleColumnConfig. All the page-relative links (../plugins/implement.md, ../code_reference/generators.md, etc.) match the new filenames in this PR.
  • Plugin description typo fix matches the docstring fix. Both say "name of the …" now. Good.
  • Env-var documentation. overview.md states DISABLE_DATA_DESIGNER_PLUGINS=true disables entry point discovery. Verify the name matches the actual variable in the discovery code — docs of this kind rot silently if the flag is renamed.

Example code quality

  • Processor example uses astype(str).apply(lambda …) in both implement.md and processor.md. Idiomatic pandas would be data[self.config.column].astype(str).str.contains(self.config.pattern, regex=True) (optionally pre-compiled is unnecessary when using the Series accessor). As a "minimum working example" it's fine; a short note that vectorized .str.contains is preferable for real workloads would help new plugin authors.
  • get_column_emoji() returns "x" in implement.md where the old example.md used "✖️". Intentional simplification is fine, but x looks like a placeholder — consider a real emoji so readers don't copy a bare letter into their log output.
  • Import style is consistent across the three tab examples: from __future__ import annotations + TYPE_CHECKING for pandas when only used in annotations. Good — this mirrors the style guide's fast-import guidance.
  • Multiple-plugins-per-package section dropped the tests_e2e reference. The removed example.md pointed at tests_e2e/ as a concrete example of this pattern; implement.md's "Multiple plugins in one package" section just shows a TOML snippet. If that e2e directory is still a working example, add the link back — it's a cheap pointer that saves plugin authors from guessing.

Documentation accuracy

  • assert_valid_plugin coverage. implement.md says "Data Designer provides a testing utility for common plugin structure checks" and shows a single example. The deleted example.md explicitly listed what it validates ("config is subclass of ConfigBase", etc.). The new, terser wording is fine for most readers, but the deleted enumeration was genuinely useful for the "what will this catch?" question. Worth preserving one sentence about it.
  • Discovery troubleshooting bullets in overview.md are good and concrete (discriminator must be a string, regex-filterREGEX_FILTER, etc.). This replaces an entire "Experimental" framing with something actionable — nice improvement.
  • Processor callback table in processor.md accurately lines up with the three process_* methods. The async-engine caveat note about row-count-changing pre/post-batch processors under DATA_DESIGNER_ASYNC_ENGINE=1 is a useful landmine callout.

Style / conventions

  • Docstring additions follow the existing Attributes-block format used elsewhere in base.py and plugin.py. No drift.
  • New __init__.py files use the right SPDX header (2026, Apache-2.0). Consistent with other engine packages.
  • mkdocs.yml alphabetization applies to Code Reference only, not to the top-level nav, which matches the comment # Keep code reference pages ordered alphabetically by nav label.. Confirm whether "Plugins" in the top-level nav should similarly list Catalog before Build Your Own (it currently goes Overview → Build Your Own → Catalog, which reads as a natural user journey and is probably preferable to alphabetical).

Risks

  • Low. This is almost entirely docs plus docstrings. The only runtime-observable change is the three new __init__.py files; because they are beneath a single installable package and do not introduce a data_designer/__init__.py, they cannot break the cross-package namespace merge.
  • One residual risk: if any tooling elsewhere in the repo relied on those three directories being implicit namespace packages (unlikely but worth a grep for pkgutil/find_namespace_packages usage around data_designer.engine.processing), it should still work — explicit subpackages are a strict superset of namespace behavior inside a single distribution.

Suggestions (non-blocking)

  1. Confirm DISABLE_DATA_DESIGNER_PLUGINS matches the env var the discovery code actually reads.
  2. Replace the "x" emoji placeholder in implement.md's column generator example with a real emoji.
  3. Prefer Series.str.contains(...) over astype(str).apply(lambda …) in the processor examples, or add a one-line note that the .apply form is for illustration.
  4. Restore the tests_e2e/ pointer in the "Multiple plugins in one package" section.
  5. Optional: keep one sentence listing what assert_valid_plugin actually checks, since the deleted page had that and it's useful signal.

Verdict

Looks good to merge. The restructure is a clear improvement over the old single-example layout, the docstring additions are well-scoped and match existing style, and the __init__.py additions are the correct fix for mkdocstrings discovery without breaking the namespace-package invariant. The suggestions above are small polish items and none of them should block this PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR graduates the Data Designer plugin system out of experimental status by restructuring the plugin docs (replacing the old example and filesystem-seed-reader guides with consolidated "Build Your Own" and "Using Models" pages), reorganizing the Code Reference nav into Config / Engine / Interface package groups, and expanding docstrings throughout so generated API docs are useful. It also adds __init__.py files in engine subpackages for mkdocstrings discovery and adds CSS for wide generated-API tables.

  • Nav restructure: Plugins nav now has Overview, Build Your Own, Using Models, and Available Plugins; Code Reference nav is split into Config, Engine, and Interface subgroups with overview pages and updated cross-links from every concepts/recipes page.
  • Source changes: New __init__.py files in engine/processing, engine/processing/processors, and engine/resources enable mkdocstrings/griffe to discover those subpackages; docstrings expanded on base generator, seed reader, and plugin classes; config.strategyconfig.generation_strategy typo fixed in custom.py; processors-outputs/processors-files/ corrected in the processors concept page to match the actual artifact folder name.
  • url_download: true added to pymdownx.snippets to enable future remote markdown snippet inclusion (not yet used by any current page).

Confidence Score: 5/5

Safe to merge — all changes are documentation, docstrings, CSS, and mkdocstrings discovery shims with no runtime logic modifications.

The diff touches Python source files only to add docstrings and empty init.py files; the one runtime-visible change is a typo fix in a CustomColumnGenerator docstring and a documentation correction for the actual processor output folder name. No execution paths, data models, or API contracts are altered.

No files require special attention. The mkdocs.yml nav restructure is internally consistent and all cross-links from concepts/recipes pages have been updated to the new paths.

Important Files Changed

Filename Overview
mkdocs.yml Nav restructured from flat reference pages to Config/Engine/Interface groups; url_download: true added to pymdownx.snippets for future remote snippet support; all entries correctly reference new file paths.
docs/plugins/build_your_own.md New consolidated plugin authoring guide with working examples for all three plugin types (column generator, seed reader, processor); replaces example.md and filesystem_seed_reader.md.
docs/plugins/models.md New guide for model-backed plugin patterns using ColumnGeneratorWithModel and ColumnGeneratorWithModelRegistry; examples and registry access patterns are accurate to the codebase.
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py Docstrings added to ColumnGeneratorCellByCell and ColumnGeneratorFullColumn; abstract generate() methods now have docstring bodies instead of ellipsis — valid Python, subclasses still must override.
packages/data-designer-config/src/data_designer/plugins/plugin.py Class-level docstrings added to PluginType and Plugin; typo fix in config_qualified_name field description.
docs/concepts/processors.md Corrected output directory reference from processors-outputs/ to processors-files/ matching the actual PROCESSORS_OUTPUTS_FOLDER_NAME constant; plugin authoring prose trimmed with link to new Build Your Own page.
docs/css/mkdocstrings.css New CSS rules make wide auto-generated API tables horizontally scrollable with fixed column widths and wrappable code annotations.
packages/data-designer-engine/src/data_designer/engine/processing/init.py New empty init.py (license header only) added so mkdocstrings/griffe can discover the processing subpackage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Plugins Nav
        PO[Overview]
        PB[Build Your Own]
        PM[Using Models]
        PA[Available Plugins]
    end

    subgraph Code Reference Nav
        CR[Overview]
        subgraph Config
            CC[analysis / column_configs / config_builder / data_designer_config / mcp / models / plugins / processors / run_config / sampler_params / seeds / validator_params]
        end
        subgraph Engine
            CE[column_generators / mcp / processors / seed_readers]
        end
        subgraph Interface
            CI[data_designer / errors / results]
        end
    end

    subgraph Removed
        EX[example.md]
        FS[filesystem_seed_reader.md]
        FLAT[Flat reference pages]
    end

    EX -->|replaced by| PB
    FS -->|replaced by| PB
    FLAT -->|replaced by| Config
    FLAT -->|replaced by| Engine
    FLAT -->|replaced by| Interface

    subgraph Discovery Shims
        PI1[engine/processing/__init__.py]
        PI2[engine/processing/processors/__init__.py]
        PI3[engine/resources/__init__.py]
    end

    PI1 & PI2 & PI3 -->|enables griffe to find| CE
Loading

Reviews (9): Last reviewed commit: "docs: update available plugins page" | Re-trigger Greptile

Comment thread mkdocs.yml Outdated
Comment thread docs/plugins/overview.md Outdated
Comment thread docs/plugins/build_your_own.md
@andreatgretel
Copy link
Copy Markdown
Contributor

docs/code_reference/plugins.md:5 plus the three new __init__.py files (engine/resources/, engine/processing/, engine/processing/processors/)

The PR description says these __init__.py files exist "so griffe (mkdocstrings) can discover SeedReader, FileSystemSeedReader, and Processor for the new code reference." But I couldn't find any ::: directive in docs/ that targets data_designer.engine.resources.* or data_designer.engine.processing.* — those classes only appear inside code-block from … import … statements, which mkdocstrings doesn't process. Codex flagged the same thing.

As shipped, only column generators get an actual mkdocstrings-rendered API reference (via code_reference/generators.md). Processor and seed-reader authors still have prose examples but no auto-rendered base-class reference like the PR description promises.

Either land code_reference/seed_readers.md and code_reference/processors.md (engine-side) here to mirror the generators.md pattern, or drop the __init__.py files and the docstring-only churn on Processor/SeedReader/FileSystemSeedReader/SeedSource/FileSystemSeedSource/ProcessorConfig until a follow-up PR delivers the rendered pages. wdyt?

johnnygreco added a commit that referenced this pull request May 5, 2026
- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.
@johnnygreco
Copy link
Copy Markdown
Contributor Author

@andreatgretel Re: #603 (comment)

This is addressed in the current PR state. The branch now has docs/code_reference/engine/seed_readers.md and docs/code_reference/engine/processors.md, both included under the Engine code-reference nav, and both pages contain mkdocstrings ::: directives for the relevant engine-side classes. I kept the __init__.py files and docstring work because those rendered reference pages now exist.

johnnygreco added a commit that referenced this pull request May 5, 2026
- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.
@johnnygreco johnnygreco force-pushed the johnny/docs/plugins-out-of-experimental-mode branch from 08d153d to 1e93465 Compare May 5, 2026 16:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@johnnygreco
Copy link
Copy Markdown
Contributor Author

@nabinchha could you take a close look at the Using Models in Plugins section?

Link: https://github.com/NVIDIA-NeMo/DataDesigner/blob/johnny/docs/plugins-out-of-experimental-mode/docs/plugins/models.md

We want to establish a good pattern for using models in plugins, especially the recommended base class split for single-model vs multi-model generators and the alias validation / health-check behavior.

@nabinchha
Copy link
Copy Markdown
Contributor

@johnnygreco — took a close pass at docs/plugins/models.md. Cross-checked it against the engine code; the base-class split and the discovery → health-check chain are described correctly. A few things worth tightening, plus one engine-side limitation that came out of the review.

Things I'd change in this PR

  1. Tighten the wording around model_alias being required. The doc currently says "the config should keep a primary model_alias field because startup health checks collect that field…". Because _run_model_health_check_if_needed does model_aliases.add(config.model_alias) unconditionally for any column type whose impl inherits ColumnGeneratorWithModelRegistry, it's effectively required, not advisory — without it, plugin users get an AttributeError from inside the health-check loop before the friendlier registry "alias not found" error ever runs. Suggest something like "The config must include a model_alias: str field — startup health checks read it directly off any column config whose generator inherits from ColumnGeneratorWithModelRegistry (including via ColumnGeneratorWithModel)."

  2. Show the PairwiseJudgeColumnConfig alongside the multi-model generator example. The single-model example shows both halves; the multi-model one only imports PairwiseJudgeColumnConfig from data_designer_pairwise_judge.config, which makes it harder for readers to see that the config defines both model_alias and judge_model_alias. A small config snippet (or an inline comment) closes the loop with point 1 and makes it visually obvious which alias gets the standard health check vs which one only gets the _validate() resolution.

  3. Sharpen the alias-validation note. "Validate additional alias fields in _validate()… so missing aliases fail before generation starts" is true, but readers may infer a model health check happens. Something like "get_model_config(alias) only verifies the alias is registered; it does not call the endpoint. Endpoint reachability is only exercised for the primary model_alias collected by the standard startup health check."

  4. Tiny copy nit: "The engine already builds a ResourceProvider for each generator" reads as one-per-generator; in practice it's one ResourceProvider per builder shared with each generator. Easy fix: "…builds a ResourceProvider and exposes its model registry to every generator at:".

Engine-side limitation surfaced by the review

The reason point 3 needs to exist at all is that secondary aliases on a packaged plugin config can't be opted into the standard startup health check today — only CustomColumnConfig.model_aliases (plural) is rolled in via an isinstance branch in the builder. For a packaged plugin with model_alias + judge_model_alias, only the primary alias gets the endpoint ping; the secondary alias's reachability and credentials only surface at first generation call.

I filed #606 to propose a small fix: a get_model_aliases() accessor on SingleColumnConfig that defaults to [self.model_alias] (preserving current behavior) and that plugin configs override to declare every alias they depend on. The builder's isinstance(config, CustomColumnConfig) branch collapses into the same path, and the docs in this PR can switch from "validate manually in _validate()" to "override get_model_aliases()". Happy to do that as a follow-up to #603 once the docs land, or fold it in if you'd rather ship them together.

Verdict on the docs alone: the substance is right, the four items above are polish, and the multi-model alias story will be much cleaner once #606 lands.

Comment thread docs/plugins/build_your_own.md
Comment thread docs/plugins/available.md Outdated
@andreatgretel
Copy link
Copy Markdown
Contributor

Bundling the code reference reorg with plugins-graduation makes sense given plugin authors now need to navigate engine.column_generators, engine.processing.processors, etc. as a real public surface. The thing I'd still flag: the reorg landed across ~10 commits and grew the diff from 19 files to 72, which makes reviewing harder and links the two for reverts. Not blocking, just wanted to surface it. A sentence in the PR description tying the two together would help future readers.

@johnnygreco
Copy link
Copy Markdown
Contributor Author

@nabinchha thanks for the close read. I pushed 4c3365ee to address the PR-local docs items:

  • made model_alias: str required wording explicit for ColumnGeneratorWithModelRegistry / ColumnGeneratorWithModel configs
  • added the missing PairwiseJudgeColumnConfig snippet so both model_alias and judge_model_alias are visible next to the multi-model generator example
  • clarified that get_model_config(alias) only validates registration, and that endpoint reachability is only covered by the standard startup health check for the primary model_alias
  • fixed the ResourceProvider wording to describe the shared provider exposed to generators

I left the #606 engine/API improvement as follow-up rather than documenting get_model_aliases() in this PR before it exists.

Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three follow-ups from a final consistency pass — one is a real failure on the default engine. Posting as a single review since GitHub's per-comment endpoint is currently 422-ing.

docs/plugins/build_your_own.md:240 (processor example)

Codex caught this: this example raises DatasetProcessingError on the default engine. dataset_builder.py:72 sets DATA_DESIGNER_ASYNC_ENGINE default to "1", the async path passes strict_row_count=True into the runner (dataset_builder.py:462,471), and processor_runner.py:86,107 raises on any row-count delta. Filter via data[mask].reset_index(...) returns a shorter frame, so the very first batch fails. The comment at dataset_builder.py:69-71 also says the sync engine is scheduled for removal after one transitional release, so DATA_DESIGNER_ASYNC_ENGINE=0 isn't a durable fix.

Two clean options: (a) move the filter to process_after_generation() so the example is row-count-stable everywhere, or (b) keep process_before_batch and add an inline !!! warning linking to concepts/processors.md#row-count-changes so readers don't copy-paste a broken example. Leaning toward (a) since "filter rows by regex" is the obvious first plugin people will write. wdyt?

docs/code_reference/interface/errors.md:3 (intro paragraph)

The intro says this page covers errors at the public API boundary, but DataDesignerEarlyShutdownError isn't documented here. It's exported from data_designer.interface (interface/__init__.py:13-22), defined with a real docstring in interface/errors.py (subclass of DataDesignerGenerationError, raised on early shutdown with no records), and concepts/architecture-and-performance.md:290-300 already has a try/except DataDesignerEarlyShutdownError: example telling users to catch it for retry-with-different-alias flows. Worth a fourth ::: section to match the other three.

docs/code_reference/engine/seed_readers.md (stub-only symbols)

nit: 5 of the 7 symbols surfaced on this page have no docstring, so they'll render as near-empty stubs in the generated reference: SeedReaderFileSystemContext, SeedReaderBatch, SeedReaderBatchReader, PandasSeedReaderBatch, and create_seed_reader_output_dataframe (engine/resources/seed_reader.py:53,58,62,67,74). SeedReader and FileSystemSeedReader both have real prose and are the load-bearing entries. Either add one-line docstrings for the five protocol/dataclass/helper symbols, or drop them from the page and let mkdocstrings only render what has documentation. wdyt?


Also a pre-existing prose drift outside the diff hunks worth fixing while you're in this area: docs/concepts/processors.md:91 says outputs land in processors-outputs/{name}/, but the actual on-disk folder is processors-files/engine/storage/artifact_storage.py:31 defines PROCESSORS_OUTPUTS_FOLDER_NAME = "processors-files", and data-designer-config/.../processors.py:69 agrees. Only the prose disagrees with what gets written.

@johnnygreco
Copy link
Copy Markdown
Contributor Author

@andreatgretel thanks for the final pass. I pushed 2403fc4a to address these:

  • changed the regex-filter processor example to use process_after_generation() so row filtering works on the default async engine
  • added DataDesignerEarlyShutdownError to the interface errors reference page
  • added one-line docstrings for the seed-reader context, batch protocols, pandas batch wrapper, and output DataFrame helper rendered on the seed readers reference page
  • fixed the processor output path prose from processors-outputs/{name}/ to processors-files/{name}/

Validation: uv run --group docs mkdocs build passes with the existing docs warnings.

andreatgretel
andreatgretel previously approved these changes May 6, 2026
Griffe (used by mkdocstrings) skips directories without __init__.py
when resolving module paths, which prevented the new plugins code
reference from rendering SeedReader, FileSystemSeedReader, and
Processor. Adding empty __init__.py files in engine/resources/,
engine/processing/, and engine/processing/processors/ aligns with
the convention already used in engine/mcp/, engine/models/, etc.
Plugin authors now see meaningful descriptions for every field and
method on the bases rendered in the plugins code reference:

- Plugin and PluginType: class docstrings + Attributes tables for
  fields and enum members; fix typo in config_qualified_name field
  description.
- SingleColumnConfig: document allow_resize.
- ProcessorConfig: document processor_type discriminator.
- SeedSource: document seed_type discriminator.
- FileSystemSeedSource: add class docstring + Attributes table for
  path / file_pattern / recursive.
- ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add
  class docstrings explaining when to use each base, plus method
  docstrings on the abstract generate() implementations.
Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.

- Add code_reference/plugins.md: single-stop reference for the Plugin
  object and the config + implementation base classes used by all
  three plugin types.
- Add code_reference/generators.md: column generator implementation
  base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
  instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
  internal-helpers note (PluginRegistry / PluginManager), and focus
  the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
  plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
  concepts/deployment-options.md.
johnnygreco added 17 commits May 6, 2026 20:33
Replace the overview's how-to walkthrough and the per-type plugin
guides with a single Build Your Own page that covers all three
plugin types side-by-side. Add a dedicated Using Models in Plugins
guide and a seed_readers code reference, and trim the overview down
to what the plugin types are, how to use one, and how discovery
works.

- Rename plugins/implement.md to plugins/build_your_own.md.
- Delete plugins/filesystem_seed_reader.md and plugins/processor.md
  (their content is now in build_your_own.md and the per-type code
  references).
- Add plugins/models.md for model-backed column generator authoring.
- Add code_reference/seed_readers.md for seed reader implementation
  base classes.
- Rewrite plugins/overview.md: shorter intro, type bullets link to
  the relevant code reference, drop the multi-step "How do you
  create plugins" walkthrough in favor of a single Build a Plugin
  pointer, tighten Discovery troubleshooting.
- Refresh plugins/available.md (Available Plugins): point to the
  DataDesignerPlugins catalog and explain how to request a community
  listing.
- Update cross-page links in concepts/processors.md,
  concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md,
  code_reference/plugins.md, and code_reference/generators.md to
  match the new structure.
- Update mkdocs.yml nav: rename to Build Your Own, add Using Models,
  add seed_readers code reference.
Code-heavy reference tables (plugin bases, column generators, etc.)
were wrapping aggressively on narrow viewports, breaking long
identifiers across multiple lines. Switch the table container to
horizontal overflow and prevent code cells from wrapping so
identifiers stay readable.
- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.
@johnnygreco johnnygreco force-pushed the johnny/docs/plugins-out-of-experimental-mode branch from c49880f to aae652e Compare May 6, 2026 20:33
@johnnygreco johnnygreco merged commit 8b8d748 into main May 6, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants