Skip to content

Conversation

@cj-zhukov
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Jan 11, 2026
@cj-zhukov
Copy link
Contributor Author

High-Level Overview

In previous PRs, a basic script was introduced that only parsed example group names #19371 and #19491

This PR implements a more comprehensive solution for keeping the examples README up-to-date:

  1. Introduces a new examples.toml file, which contains metadata for examples, including:
  • subcommand — the command used with cargo run --example [group] -- [subcommand]
  • file — the actual Rust source file
  • desc — a short description of the example
  1. Updates the generate_examples_docs.sh script to:
  • Parse examples.toml
  • Check that all examples in the filesystem exist and are listed in the TOML
  • Generate a new README file as README-NEW.md
  1. Integrates CI validation:
  • README-NEW.md is formatted with DataFusion’s Prettier
  • Compared against the committed README.md
  • If there are differences, the CI check fails and prints a clear diff, along with instructions for updating the README after verifying examples.toml

This ensures that the README for examples stays in sync with the actual examples and encourages maintainers to keep examples.toml accurate.

@cj-zhukov
Copy link
Contributor Author

@Jefffrey since you helped with previous PRs related to example documentation, it would be great if you could take a look at this one as well. Your feedback on the CI check approach or any improvements would be much appreciated.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I haven't looked too in-depth but I have some concerns with this approach:

  • A lot of bash scripting; I know we have other scripts like this, but I wonder if this amount of logic is better served as Python or even Rust scripts
    • Some of the Bash scripting behaviour is not ideal too, like how it doesn't capitalze SQL and IO properly
  • Needing to have a separate metadata toml file doesn't seem ideal, but I can't think of another way to handle this other than manually parsing the Rust code of the examples (which is worse)
    • We managed to use a Rust based approach for function docs because we used a Rust binary to access the functions which are exposed as a lib; that won't work for examples 🤔

I don't have too many ideas myself unfortunately, would love if anyone else can chime in 😅

@cj-zhukov
Copy link
Contributor Author

@Jefffrey Thanks for the review and for raising these points - they’re all fair concerns.

I agree this is pushing Bash beyond simple glue logic. I started there to avoid introducing new dependencies and to stay consistent with existing CI scripts, but if the overall approach makes sense, I’d be happy to follow up with a Rust or Python implementation to improve maintainability.

Regarding examples.toml: I also don’t love having a separate metadata file, but I couldn’t find a reliable single source of truth for example subcommands and descriptions. Parsing Rust source felt more brittle, and unlike function docs, examples aren’t exposed via a library API that we can introspect.

The capitalization issues you noticed (e.g. SQL / IO) are artifacts of the current heuristics and can be fixed - either via normalization or explicit metadata.

I’m definitely open to alternative ideas here and would love more input if others have suggestions.

@cj-zhukov
Copy link
Contributor Author

Based on this discussion, I’m working on a new Rust-based implementation to replace generate_examples_docs.sh.
At the same time, I’m experimenting with a code-driven approach that removes the need for examples.toml, so the README can stay in sync directly with the examples themselves.

Once this new implementation is ready, I’ll share it here so we can review the updated approach and decide whether it’s a better fit.

@cj-zhukov
Copy link
Contributor Author

I replaced the shell-based examples documentation generator with a Rust implementation.
What changed:

  • Replaced generate_examples_docs.sh with a Rust binary (examples-docs)
  • Removed examples.toml
  • Documentation is now generated directly from structured doc comments in each example group’s main.rs
  • CI script was updated to call the Rust generator instead of the shell script

Design change:

  • Files lived in examples/<group>/*.rs
  • Metadata (subcommand, description) lived in examples.toml
  • A shell script merged the two

Now:

  • Each example group’s main.rs contains the authoritative documentation for that group
  • The Rust generator parses this information and renders the README directly
  • There is a single source of truth

Personally, I find this approach cleaner and more “Rust-native” than maintaining a complex shell script and a separate TOML file. A follow-up PR could switch this to a dedicated parsing crate (e.g. nom) if the parsing rules become more complex or harder to maintain. That said, I’m very open to feedback and alternative designs if there are concerns about this direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants