diff --git a/text/0001-package-structure.md b/text/0001-package-structure.md new file mode 100644 index 0000000..f78e326 --- /dev/null +++ b/text/0001-package-structure.md @@ -0,0 +1,135 @@ +- Feature Name: `package-structure` +- Start Date: 2024-10-24 +- RFC PR: [CMIP-REF/rfcs#0001](https://github.com/CMIP-REF/rfcs/pull/0001) + +# Summary +[summary]: #summary + +This RFC proposes how we should structure the git repository to enable the rapid development of the project while it is immature. +The core of the project will have some reuse outside of the REF frontend +so it should be publishable as a separate package. +The goal is to make it easier for users to extend the project with new metrics providers. + +# Motivation +[motivation]: #motivation + +* At the start of this project we will be in a state of flux until we have a working prototype. + APIs will change, and we will be adding and removing features. +* Each metrics package will come with its own set of dependencies. Odds are that they will clash with other packages. +* We want something that is easy to extend in future, incorporating new metrics providers/metrics. +* Consistent tooling across the project + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +It is proposed that we start with a mono-repo structure consisting of multiple packages, +each with their own set of dependencies. +The tooling and CI will be shared across all projects. +There will be a `packages` directory at the root of the repo, with each package in its own subdirectory. +An example has been implemented in [CMIP-REF/cmip-ref#1](https://github.com/CMIP-REF/cmip-ref/pull/0001), +which contains a `ref-core` package and an example `ref-metrics-example` package. + +Each package will have a `README.md` file that describes the purpose of the package and how to use it, +its own `pyproject.toml` describing the package and it's dependencies, +and their own set of tests. +The test suite for each package is run with only the dependencies required for that package. + +A common set of integration tests will also be run to ensure that the packages work together. + +The purpose of this structure is to split the project into smaller, more manageable parts +and to differentiate between the "library" part of the project which is reusable +and the "application" which builds ontop of the library + metrics packages + additional services. + +An additional benefit is that it will be easier to refactor code if it is all accessible in the same repository. + +## Packages +### `ref-core` +This package will be a library containing the core functionality of the REF. +This package will be a dependency of all other packages as it will describe the interfaces that metrics providers must implement. +This allows us to keep the core functionality separate from the metrics providers. + +### `ref-metrics-*` +Each metric provider will implement a separate package that implements the interfaces described by the `ref-core` package. +These will very thin packages that wrap around the existing APIs of the benchmarking packages or use CMEC to execute. + +An example package `ref-metrics-example` has been implemented in [CMIP-REF/cmip-ref#5](https://github.com/CMIP-REF/cmip-ref/pull/5). +See the [code](https://github.com/CMIP-REF/cmip-ref/tree/basic-interface/packages/ref-metrics-example) for more details. + +The purpose of these packages are to provide a Python-based hook to execute a metric and return the results. +This will include (hopefully thin) wrappers around the metric providers existing APIs +or build the required CMEC configuration files. +It should require little to no change to the existing codebase benchmarking packages. + +The repository will set up a CI pipeline that will run the unit tests for each package. +This includes fetching some sample data. + +These packages will maintain their own dependencies (python and otherwise) and packaged up as a docker image. + +The package will be named `ref-metrics-` where `` is the name of the provider. + +### `ref-api` (TBD) +This application package will bring together the core and metrics packages to provide a service that +responds to new data submissions, +tracks the execution of metrics and the results. +The package may have indirect dependencies on the metrics packages to avoid conflicting dependencies. +That adds additional complexity, but will be more flexible in the long run. + +This is the service that will be deployed to ESGF alongside any additional services that are required. + +Whether this application can be done solely via a CLI-based tool is up for debate. + +# Drawbacks +[drawbacks]: #drawbacks + +* More people working in the same repository may lead to more merge conflicts. +* Larger repository with more moving parts/concepts + + +Another drawback is that it will be harder to manage different package managers. +The project currently uses [uv](https://docs.astral.sh/uv/) as a package manager as the core and framework +will not require any complex to install dependencies, +but science packages may require more complex dependencies. +There is nothing stopping the use of something like [pixi](https://pixi.sh/dev/) when conda-based +dependencies are required for a particular package. + +The CI times may become longer. This may be somewhat insidious +as at the beginning the developer experience will be great. +This will gradually degrade as the number of packages/providers increases. +The test for each provider can be run as a separate job. +We may also choose to target a single Python version if it becomes an issue. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +A mono-repo structure will enable the rapid development of the project while it is immature +and in a state of flux. +It will be easier to refactor code as it is all in the same repository. + +- Splitting out each package to a separate repository is a possibility +and is a very common approach in the Python community. +The tooling for managing mono-repos is more mature than it was a few years ago which enables the proposed approach. +The downside of that is that changes across multiple repositories will be required as we add new features. +- A single package with extras. +The downside of this is that it will be harder to incorporate new providers as they will be incorporated +into the main package. +The framework will be an application rather than a library so pip installing it will not be as useful. + +# Prior art +[prior-art]: #prior-art + +I've tried this out in another [recent project](https://github.com/climate-resource/bookshelf) which +published multiple packages to PyPI from a single repository. +It worked well for that project as it enabled clearer separations and reuse of parts of the codebase +without depending on a much larger package. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +* An example that uses `pixi` to install conda packages would be useful as a proof of concept. + +# Future possibilities +[future-possibilities]: #future-possibilities + +Once the project has matured, we can consider splitting the packages into separate repositories. +Each package can then have its own tooling and versioning. +That would depend on packaging and publishing the `ref-core` package to PyPI and conda-forge.