Skip to content

Generate pyo3 schemas for SDK #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Feb 12, 2025
Merged

Conversation

bryfox
Copy link
Contributor

@bryfox bryfox commented Feb 10, 2025

This PR produces well-known Foxglove schemas as pyclasses which can be used by pyo3 in the SDK. For the default channel logging interface, we plan to remove protobuf serialization on the python side, and instead use the existing serialization in SDK core. We've demonstrated that this can be done with little relative overhead; see FG-10358.

Most of the changes here are in generatePyclass.ts; per FG-10347, we're going to simplify build steps between schemas and SDK. For now, this produces output in a pyclass directory, and the build script for the python SDK copies those files into the project. (Open to suggestions here.)

The generator produces two files:

  • the schemas, expressed as pyclass-annotated Rust structs & enums,
  • the .pyi stub interface for the above,

Currently, the submodule is registered with a function call instead of a declarative module. This mechanism is set to be deprecated in pyo3, but integrating with hand-written python isn't straightforward. Once we have the build process unified from FG-10347, it'll be a little easier to make the declarative submodule syntax work.

Follow-up PRs will:

  • add getters or accessors to the schema fields + optional ctor parameters
  • generate schema channel types
  • update the python examples, and re-export the private schemas under the public module

Copy link

linear bot commented Feb 10, 2025

@bryfox
Copy link
Contributor Author

bryfox commented Feb 10, 2025

@amacneil I think you have the vision for unifying the codegen (FG-10347). This is an intermediate step, and it's a little messy since the generated source is coupled to SDK source — ideas welcome, or we can plan to clean it up as part of that issue.

@@ -142,6 +150,43 @@ async function main({ outDir, rosOutDir }: { outDir: string; rosOutDir: string }
generateMarkdown(Object.values(foxgloveMessageSchemas), Object.values(foxgloveEnumSchemas)),
);
});

await logProgress("Generating Pyclass definitions", async () => {
const dir = await fs.mkdir(path.join(outDir, "pyclass"), { recursive: true });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest writing these directly to the python directory, like we're doing for the typescript types.

The top level schemas directory is really just to make it easy for people to find if they want to copy/paste the .proto or other files into their own project. I can't think of any other reason to save the generated python code here.

We should try to keep the generated code in its own directory though, so we can add a top level yarn clean. I'll make a PR with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: #202

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done now. I wonder if this shouldn't even be part of the same update task; they feel spiritually different, and contributors to schemas don't need to worry about this one. This is especially true since (per comments below) the pyclass generation now has optional dependencies on cargo, poetry, and black.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up with your changes above. For now, I put this behind a --include-sdk flag, which keeps the diff smaller. I do think a separate command might be worthwhile, but that could probably wait for FG-10347.

Copy link
Contributor

@eloff eloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These produce Vec u8 for bytes, are we going to punt on the performance issue for now and hope it gets fixed upstream?

Still looking this over.

/// A primitive representing an arrow
#[pyclass]
#[derive(Clone)]
pub(crate) struct ArrowPrimitive {
Copy link
Contributor

@eloff eloff Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we taking this approach of duplicating the fields for the struct instead of wrapping the foxglove::schemas type?

pub(crate) struct ArrowPrimitive(foxglove::schemas::ArrowPrimitive);

Are there problems with that approach we can't overcome? I don't remember the reason we're doing it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we can do it that way; I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done with 0f6b5c7

/// An enumeration indicating how input points should be interpreted to create lines
#[pyclass(eq, eq_int)]
#[derive(PartialEq, Clone)]
pub(crate) enum LinePrimitiveLineType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on generating each of these as separate files?

if not, at least alphabetically sorting them would be a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I guess maybe they are alpha-sorted after the enums? I can't tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structs are sorted since the input is. We could generate separate files, but this follows the convention of our protobuf gen, and I'm not sure what the benefit would be — these files are consumed by pyo3, not end users.

I did update the .pyi stub so that all definitions are ordered alphabetically.

@@ -0,0 +1,1921 @@
//! Definitions for well-known Foxglove schemas
//! Generated by https://github.com/foxglove/schemas
Copy link
Contributor

@amacneil amacneil Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if possible to put the generated files in a subdirectory, rather than mixed in with normal code, so that we can easily clean it prior to regenerate.

In #202 I've started consistently doing this, I also found a bug in #203 as a result of not fully cleaning generated output directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated rust files are now in a subdirectory. I'll clean this up a big once #202 merges.

I don't see an easy way to do the same for the .pyi stubs, since (I think) they need to fit into the expected module hierarchy, and that doesn't all come from codegen.

@@ -51,4 +51,4 @@ warn_unreachable = true

[tool.black]
include = '\.pyi?$'
extend-exclude = "python/foxglove/_protobuf"
extend-exclude = "python/foxglove/(_protobuf|_foxglove_py/schemas.pyi)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could run black after generating code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 The typescript script now runs black and rustfmt. If either tool isn't installed, this step is skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure the format step always runs in CI (e.g. fails if either tool is not installed), so that we don't mark the PR as green unless the code is correctly formatted.

e.g. from this step https://github.com/foxglove/foxglove-sdk/blob/main/.github/workflows/ci.yml#L76-L89

@bryfox bryfox requested a review from amacneil February 11, 2025 21:52
Copy link
Contributor

@gasmith gasmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG in general. A few comments.

// Pyi stub file
await fs.writeFile(pythonSdkStub, generatePymoduleStub([...enumSchemas, ...messageSchemas]));
try {
spawnSync("poetry", ["run", "black", path.resolve(pythonSdkStub)], spawnOpts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're not checking the result from spawnSync. I noticed because on my system, yarn generate doesn't format schemas.pyi. I think that's because when I do poetry run in the top-level directory, it fails with Command not found: black.

For fixing the poetry run failure, maybe: spawnSync("poetry", ["--directory", path.resolve(pythonSdkRoot), "run", ...]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #209 I added an exec() helper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. I'm now checking status and will refactor if #209 merges first.

For fixing the poetry run failure

This should work; I think you want to run poetry install in the repo root now. I updated the Contributing doc here, and can flesh that out more if needed.

* Generate a .pyi stub for the given schemas.
*/
export function generatePymoduleStub(schemas: FoxgloveSchema[]): string {
const header = ["from enum import Enum", "from typing import List"].join("\n") + "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's a add a # Generated by ... line too.

export function generatePymodule(schemas: FoxgloveSchema[]): string {
const header =
[
`use pyo3::prelude::*;\n`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Generated by ...

/**
* Generate a rust module which exports the given schemas annotated with `pymodule`.
*/
export function generatePymodule(schemas: FoxgloveSchema[]): string {
Copy link
Contributor

@gasmith gasmith Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the last file is not part of the SDK build, and the submodule is registered with a function call instead. This mechanism is set to be deprecated in pyo3. Once we have the build process unified from FG-10347, it'll be a little easier to make the declarative submodule syntax work.

I feel like we should leave this out until we're actually using it so that it doesn't rot.

Why does this need to wait for unified codegen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to wait, and based on #209 that's not going to get us any closer. The problem now is that our exported python modules are a mix of generated and python-native code, and I can't (as far as I know) reference the generated module declaratively per rust-lang/rust#54727.

I've removed it for now; tracked by FG-10457.

amacneil added a commit that referenced this pull request Feb 12, 2025
### Changelog
None

### Docs
None

### Description
Extended `yarn generate` script to call `cargo run --bin
foxglove-proto-gen`, required to generate the rust code.

The rust generation code cannot be directly ported to typescript because
it relies on a rust library (prost) to generate its protobuf
implementation.

The only remaining codegen that does not live in `yarn generate` is for
python; this will become obsolete once
#201 lands.
@@ -115,6 +115,7 @@ jobs:
- uses: actions/setup-python@v5
with:
python-version: "3.9"
- run: rustup component add rustfmt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use setup-rust instead? this feels a bit out of place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, much better

### Generate schemas for the Python SDK

```bash
yarn generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file and comments about Python don't seem relevant to the rust foxglove package

Comment on lines 75 to 83
const pythonSdkRoot = path.resolve(repoRoot, "python", "foxglove-sdk");
const pythonSdkGeneratedRoot = path.join(pythonSdkRoot, "src", "generated");
const pythonSdkStub = path.join(
pythonSdkRoot,
"python",
"foxglove",
"_foxglove_py",
"schemas.pyi",
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi with path.join you can just use / directory separators, it will work cross platform (also I doubt this generate script will work on windows anyway, with all the subcommands it calls)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const pythonSdkRoot = path.resolve(repoRoot, "python", "foxglove-sdk");
const pythonSdkGeneratedRoot = path.join(pythonSdkRoot, "src", "generated");
const pythonSdkStub = path.join(
pythonSdkRoot,
"python",
"foxglove",
"_foxglove_py",
"schemas.pyi",
);
const pythonSdkRoot = path.resolve(repoRoot, "python/foxglove-sdk");
const pythonSdkGeneratedRoot = path.join(pythonSdkRoot, "src/generated");
const pythonSdkStub = path.join(pythonSdkRoot, "python/foxglove/_foxglove_py/schemas.pyi");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I pushed that change after seeing your comment but before the suggestion here.

@bryfox bryfox requested review from amacneil, eloff and gasmith February 12, 2025 20:03
@bryfox bryfox merged commit 445ed39 into main Feb 12, 2025
28 checks passed
@bryfox bryfox deleted the bryan/fg-6783-generate-pyo3-schemas branch February 12, 2025 20:20
bryfox added a commit that referenced this pull request Feb 12, 2025
This builds on #201 to add `pyclass`-annotated structs for typed
channels, with the corresponding interface stub.

We can't export generic classes to Python, so we provide one for each
well-known Foxglove schema. Following the Rust SDK's logic when
implementing the `Encode` trait, this omits implementations for any
"Primitive" schemas.

It also exports a `PartialMetadata` type for the logging interface.

A follow-up PR will need to re-work the python module hierarchy a bit,
so that these modules can be used directly from `foxglove`, but the
changes here are enough to get the following to work:

```py
from foxglove._foxglove_py.schemas import SceneUpdate
from foxglove._foxglove_py.channels import SceneUpdateChannel
ch = SceneUpdateChannel("/test")
# ch.log(1) # TypeError
ch.log(SceneUpdate(entities=[], deletions=[]))
```
bryfox added a commit that referenced this pull request Feb 12, 2025
This PR integrates the new generated Schema and typed Channel classes
from #201 and #206 respectively, and updates the examples accordingly.

As part of this, I've adjusted the Schema classes to make them easier to
work with:
- fields are now `Optional`, with defaults. Callers can omit any
properties during construction.
- constructors are keyword-only, which should be more future-proof.

The protobuf dependency is no longer needed in the Python SDK.

For discussion: I've also re-worked the high-level API a bit. It works
with any given foxglove Schema now (though could use better typing). But
it no longer works with arbitrary protobuf messages. We could support
that, but I'm not sure how valuable it is. We could support arbitrarily
serialized data, but then you'd need to provide your schema, and the
channel interface seems much nicer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants