-
Notifications
You must be signed in to change notification settings - Fork 0
docs: Package structure RFC #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
- Feature Name: `package-structure` | ||
- Start Date: 2024-10-24 | ||
- RFC PR: [CMIP-REF/rfcs#0001](https://github.com/CMIP-REF/rfcs/pull/0001) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
This RFC proposes how we should structure the git repository to enable the rapid development of the project while it is immature. | ||
The core of the project will have some reuse outside of the REF frontend | ||
so it should be publishable as a separate package. | ||
The goal is to make it easier for users to extend the project with new metrics providers. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
* At the start of this project we will be in a state of flux until we have a working prototype. | ||
APIs will change, and we will be adding and removing features. | ||
* Each metrics package will come with its own set of dependencies. Odds are that they will clash with other packages. | ||
* We want something that is easy to extend in future, incorporating new metrics providers/metrics. | ||
* Consistent tooling across the project | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
It is proposed that we start with a mono-repo structure consisting of multiple packages, | ||
each with their own set of dependencies. | ||
The tooling and CI will be shared across all projects. | ||
There will be a `packages` directory at the root of the repo, with each package in its own subdirectory. | ||
An example has been implemented in [CMIP-REF/cmip-ref#1](https://github.com/CMIP-REF/cmip-ref/pull/0001), | ||
which contains a `ref-core` package and an example `ref-metrics-example` package. | ||
|
||
Each package will have a `README.md` file that describes the purpose of the package and how to use it, | ||
its own `pyproject.toml` describing the package and it's dependencies, | ||
lewisjared marked this conversation as resolved.
Show resolved
Hide resolved
|
||
and their own set of tests. | ||
The test suite for each package is run with only the dependencies required for that package. | ||
|
||
A common set of integration tests will also be run to ensure that the packages work together. | ||
|
||
The purpose of this structure is to split the project into smaller, more manageable parts | ||
and to differentiate between the "library" part of the project which is reusable | ||
and the "application" which builds ontop of the library + metrics packages + additional services. | ||
|
||
An additional benefit is that it will be easier to refactor code if it is all accessible in the same repository. | ||
|
||
## Packages | ||
### `ref-core` | ||
This package will be a library containing the core functionality of the REF. | ||
This package will be a dependency of all other packages as it will describe the interfaces that metrics providers must implement. | ||
This allows us to keep the core functionality separate from the metrics providers. | ||
|
||
### `ref-metrics-*` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be easier for people who want to use the REF with a custom metric provider if these were maintained outside of the main REF repository. If we move the example to a separate repository, it will be much easier for relative outsiders to see how to implement the interface because they don't have to dig through all of the REF infrastructure and configuration to find what they're looking for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Let's migrate the example outside of this repository. It is probably best to wait until we are happy with a PoC as it is easier to refactor all in one spot. We will also require publishing the Perhaps something for the new year. |
||
Each metric provider will implement a separate package that implements the interfaces described by the `ref-core` package. | ||
These will very thin packages that wrap around the existing APIs of the benchmarking packages or use CMEC to execute. | ||
|
||
An example package `ref-metrics-example` has been implemented in [CMIP-REF/cmip-ref#5](https://github.com/CMIP-REF/cmip-ref/pull/5). | ||
See the [code](https://github.com/CMIP-REF/cmip-ref/tree/basic-interface/packages/ref-metrics-example) for more details. | ||
|
||
The purpose of these packages are to provide a Python-based hook to execute a metric and return the results. | ||
This will include (hopefully thin) wrappers around the metric providers existing APIs | ||
or build the required CMEC configuration files. | ||
It should require little to no change to the existing codebase benchmarking packages. | ||
|
||
The repository will set up a CI pipeline that will run the unit tests for each package. | ||
This includes fetching some sample data. | ||
|
||
These packages will maintain their own dependencies (python and otherwise) and packaged up as a docker image. | ||
|
||
The package will be named `ref-metrics-<provider>` where `<provider>` is the name of the provider. | ||
|
||
### `ref-api` (TBD) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way I understand it, there are at least two use cases that each need their own application.
It looks like this proposes 1), but maybe it would be more convenient to start with 2) as having that is convenient for development. Or is that covered somewhere else already? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add/tweak another one:
I'm assuming that modelling centers want some form of tracking, but that is only a guess so we should validate that. The tracking requires a database which in likely just sqlite which doesn't require any additional services. There is a lot of overlap of 1 and 2, but 3 is probably how benchmarking package maintainers will develop their packages. Each of the metrics packages should be able to be run directly in a notebook as well if you don't require the complexity that comes from the "compute engine". |
||
This application package will bring together the core and metrics packages to provide a service that | ||
responds to new data submissions, | ||
tracks the execution of metrics and the results. | ||
The package may have indirect dependencies on the metrics packages to avoid conflicting dependencies. | ||
That adds additional complexity, but will be more flexible in the long run. | ||
Comment on lines
+74
to
+75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what this means, I would recommend leaving this out or making it more explicit what is intended. |
||
|
||
This is the service that will be deployed to ESGF alongside any additional services that are required. | ||
|
||
Whether this application can be done solely via a CLI-based tool is up for debate. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
* More people working in the same repository may lead to more merge conflicts. | ||
* Larger repository with more moving parts/concepts | ||
|
||
|
||
Another drawback is that it will be harder to manage different package managers. | ||
The project currently uses [uv](https://docs.astral.sh/uv/) as a package manager as the core and framework | ||
lewisjared marked this conversation as resolved.
Show resolved
Hide resolved
|
||
will not require any complex to install dependencies, | ||
but science packages may require more complex dependencies. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand it correctly, we're not planning to do any science in the REF. The science should happen in the metrics/diagnostics packages. Therefore I would not expect that we need any science dependencies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. But the ref-metric-* packages would pull those science dependencies in. Those metric provider packages probably should live outside of the I'll update the text to reflect this. |
||
There is nothing stopping the use of something like [pixi](https://pixi.sh/dev/) when conda-based | ||
dependencies are required for a particular package. | ||
|
||
The CI times may become longer. This may be somewhat insidious | ||
as at the beginning the developer experience will be great. | ||
This will gradually degrade as the number of packages/providers increases. | ||
The test for each provider can be run as a separate job. | ||
We may also choose to target a single Python version if it becomes an issue. | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
A mono-repo structure will enable the rapid development of the project while it is immature | ||
and in a state of flux. | ||
It will be easier to refactor code as it is all in the same repository. | ||
|
||
- Splitting out each package to a separate repository is a possibility | ||
and is a very common approach in the Python community. | ||
The tooling for managing mono-repos is more mature than it was a few years ago which enables the proposed approach. | ||
The downside of that is that changes across multiple repositories will be required as we add new features. | ||
- A single package with extras. | ||
The downside of this is that it will be harder to incorporate new providers as they will be incorporated | ||
into the main package. | ||
The framework will be an application rather than a library so pip installing it will not be as useful. | ||
lewisjared marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
I've tried this out in another [recent project](https://github.com/climate-resource/bookshelf) which | ||
published multiple packages to PyPI from a single repository. | ||
It worked well for that project as it enabled clearer separations and reuse of parts of the codebase | ||
without depending on a much larger package. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
* An example that uses `pixi` to install conda packages would be useful as a proof of concept. | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
Once the project has matured, we can consider splitting the packages into separate repositories. | ||
Each package can then have its own tooling and versioning. | ||
That would depend on packaging and publishing the `ref-core` package to PyPI and conda-forge. |
Uh oh!
There was an error while loading. Please reload this page.