docs: graduate plugins out of experimental mode#603
Conversation
|
Docs preview: https://2674aef1.dd-docs-preview.pages.dev
|
Review: PR #603 —
|
Greptile SummaryThis PR graduates the Data Designer plugin system out of experimental status by restructuring the plugin docs (replacing the old example and filesystem-seed-reader guides with consolidated "Build Your Own" and "Using Models" pages), reorganizing the Code Reference nav into Config / Engine / Interface package groups, and expanding docstrings throughout so generated API docs are useful. It also adds
|
| Filename | Overview |
|---|---|
| mkdocs.yml | Nav restructured from flat reference pages to Config/Engine/Interface groups; url_download: true added to pymdownx.snippets for future remote snippet support; all entries correctly reference new file paths. |
| docs/plugins/build_your_own.md | New consolidated plugin authoring guide with working examples for all three plugin types (column generator, seed reader, processor); replaces example.md and filesystem_seed_reader.md. |
| docs/plugins/models.md | New guide for model-backed plugin patterns using ColumnGeneratorWithModel and ColumnGeneratorWithModelRegistry; examples and registry access patterns are accurate to the codebase. |
| packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py | Docstrings added to ColumnGeneratorCellByCell and ColumnGeneratorFullColumn; abstract generate() methods now have docstring bodies instead of ellipsis — valid Python, subclasses still must override. |
| packages/data-designer-config/src/data_designer/plugins/plugin.py | Class-level docstrings added to PluginType and Plugin; typo fix in config_qualified_name field description. |
| docs/concepts/processors.md | Corrected output directory reference from processors-outputs/ to processors-files/ matching the actual PROCESSORS_OUTPUTS_FOLDER_NAME constant; plugin authoring prose trimmed with link to new Build Your Own page. |
| docs/css/mkdocstrings.css | New CSS rules make wide auto-generated API tables horizontally scrollable with fixed column widths and wrappable code annotations. |
| packages/data-designer-engine/src/data_designer/engine/processing/init.py | New empty init.py (license header only) added so mkdocstrings/griffe can discover the processing subpackage. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph Plugins Nav
PO[Overview]
PB[Build Your Own]
PM[Using Models]
PA[Available Plugins]
end
subgraph Code Reference Nav
CR[Overview]
subgraph Config
CC[analysis / column_configs / config_builder / data_designer_config / mcp / models / plugins / processors / run_config / sampler_params / seeds / validator_params]
end
subgraph Engine
CE[column_generators / mcp / processors / seed_readers]
end
subgraph Interface
CI[data_designer / errors / results]
end
end
subgraph Removed
EX[example.md]
FS[filesystem_seed_reader.md]
FLAT[Flat reference pages]
end
EX -->|replaced by| PB
FS -->|replaced by| PB
FLAT -->|replaced by| Config
FLAT -->|replaced by| Engine
FLAT -->|replaced by| Interface
subgraph Discovery Shims
PI1[engine/processing/__init__.py]
PI2[engine/processing/processors/__init__.py]
PI3[engine/resources/__init__.py]
end
PI1 & PI2 & PI3 -->|enables griffe to find| CE
Reviews (9): Last reviewed commit: "docs: update available plugins page" | Re-trigger Greptile
The PR description says these As shipped, only column generators get an actual mkdocstrings-rendered API reference (via Either land |
- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.
|
@andreatgretel Re: #603 (comment) This is addressed in the current PR state. The branch now has |
- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.
08d153d to
1e93465
Compare
|
All contributors have signed the DCO ✍️ ✅ |
|
@nabinchha could you take a close look at the Using Models in Plugins section? We want to establish a good pattern for using models in plugins, especially the recommended base class split for single-model vs multi-model generators and the alias validation / health-check behavior. |
b84bae9 to
a88c702
Compare
|
@johnnygreco — took a close pass at Things I'd change in this PR
Engine-side limitation surfaced by the review The reason point 3 needs to exist at all is that secondary aliases on a packaged plugin config can't be opted into the standard startup health check today — only I filed #606 to propose a small fix: a Verdict on the docs alone: the substance is right, the four items above are polish, and the multi-model alias story will be much cleaner once #606 lands. |
|
Bundling the code reference reorg with plugins-graduation makes sense given plugin authors now need to navigate |
|
@nabinchha thanks for the close read. I pushed
I left the #606 engine/API improvement as follow-up rather than documenting |
There was a problem hiding this comment.
Three follow-ups from a final consistency pass — one is a real failure on the default engine. Posting as a single review since GitHub's per-comment endpoint is currently 422-ing.
docs/plugins/build_your_own.md:240 (processor example)
Codex caught this: this example raises DatasetProcessingError on the default engine. dataset_builder.py:72 sets DATA_DESIGNER_ASYNC_ENGINE default to "1", the async path passes strict_row_count=True into the runner (dataset_builder.py:462,471), and processor_runner.py:86,107 raises on any row-count delta. Filter via data[mask].reset_index(...) returns a shorter frame, so the very first batch fails. The comment at dataset_builder.py:69-71 also says the sync engine is scheduled for removal after one transitional release, so DATA_DESIGNER_ASYNC_ENGINE=0 isn't a durable fix.
Two clean options: (a) move the filter to process_after_generation() so the example is row-count-stable everywhere, or (b) keep process_before_batch and add an inline !!! warning linking to concepts/processors.md#row-count-changes so readers don't copy-paste a broken example. Leaning toward (a) since "filter rows by regex" is the obvious first plugin people will write. wdyt?
docs/code_reference/interface/errors.md:3 (intro paragraph)
The intro says this page covers errors at the public API boundary, but DataDesignerEarlyShutdownError isn't documented here. It's exported from data_designer.interface (interface/__init__.py:13-22), defined with a real docstring in interface/errors.py (subclass of DataDesignerGenerationError, raised on early shutdown with no records), and concepts/architecture-and-performance.md:290-300 already has a try/except DataDesignerEarlyShutdownError: example telling users to catch it for retry-with-different-alias flows. Worth a fourth ::: section to match the other three.
docs/code_reference/engine/seed_readers.md (stub-only symbols)
nit: 5 of the 7 symbols surfaced on this page have no docstring, so they'll render as near-empty stubs in the generated reference: SeedReaderFileSystemContext, SeedReaderBatch, SeedReaderBatchReader, PandasSeedReaderBatch, and create_seed_reader_output_dataframe (engine/resources/seed_reader.py:53,58,62,67,74). SeedReader and FileSystemSeedReader both have real prose and are the load-bearing entries. Either add one-line docstrings for the five protocol/dataclass/helper symbols, or drop them from the page and let mkdocstrings only render what has documentation. wdyt?
Also a pre-existing prose drift outside the diff hunks worth fixing while you're in this area: docs/concepts/processors.md:91 says outputs land in processors-outputs/{name}/, but the actual on-disk folder is processors-files/ — engine/storage/artifact_storage.py:31 defines PROCESSORS_OUTPUTS_FOLDER_NAME = "processors-files", and data-designer-config/.../processors.py:69 agrees. Only the prose disagrees with what gets written.
|
@andreatgretel thanks for the final pass. I pushed
Validation: |
Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc.
Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations.
Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.
- Add code_reference/plugins.md: single-stop reference for the Plugin
object and the config + implementation base classes used by all
three plugin types.
- Add code_reference/generators.md: column generator implementation
base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
internal-helpers note (PluginRegistry / PluginManager), and focus
the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
concepts/deployment-options.md.
Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference.
Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable.
- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.
c49880f to
aae652e
Compare
Summary
Updates the docs around two related areas: plugin authoring now that the extension points are no longer experimental, and the code reference section so APIs are grouped by package/layer and have enough context to be useful from the docs site.
Changes
Added
docs/plugins/build_your_own.mdas the consolidated guide for building column generator, seed reader, and processor plugins.docs/plugins/models.mdfor model-backed plugin patterns and model registry usage.docs/code_reference/config/,docs/code_reference/engine/, anddocs/code_reference/interface/, with overview pages for each group.__init__.pyfiles in engineresourcesandprocessingsubpackages so mkdocstrings/griffe can discover seed reader and processor classes.Changed
docs/plugins/available.md.mkdocs.ymlby Config, Engine, and Interface, with updated cross-links from concepts and recipes.Removed
example.md,filesystem_seed_reader.md,processor.md) with the consolidated Build Your Own guide and targeted reference pages.Attention Areas
mkdocs.yml- navigation moved from flat reference pages to package groups, and remote markdown snippets are now enabled for the plugin catalog table.docs/plugins/available.md- the NVIDIA plugin table is pulled from the raw DataDesignerPlugins catalog during docs builds.docs/code_reference/- page paths and anchors changed as part of the Config / Engine / Interface split.packages/data-designer-*/src/data_designer/docstrings and discovery shims - these are mostly documentation-facing changes, but they affect what mkdocstrings exposes in the generated docs.Validation
uv run --group docs mkdocs buildDescription updated with AI