From 82a50c5f41f05312569b6ef686d80aa84437f69c Mon Sep 17 00:00:00 2001 From: Francis Charette-Migneault Date: Tue, 23 Apr 2024 17:28:11 -0400 Subject: [PATCH 1/8] deprecated in favor of MLM extension The *Machine Learning Model (MLM)* extension (https://github.com/crim-ca/mlm-extension) combines the fields that were previously defined in the *Deep Learning Model (DLM)* extension as well as most (all?) fields proposed by ML-Model. Some fields are renamed to avoid redundant details between the 2 references, while others are adjusted to allow more flexibility (e.g.: not just docker-compose runtime, but virtually anything). More best-practices and examples are provided to demonstrate the use of MLM along other STAC extensions to take advantage of the full STAC ecosystem. Schema for MLM: https://crim-ca.github.io/mlm-extension/v1.0.0/schema.json --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3f2f2e5..79544d4 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,16 @@ # ML Model Extension Specification +> :warning:
+> This repository is deprecated in favor of +> [https://github.com/crim-ca/mlm-extension](https://github.com/crim-ca/mlm-extension).
+> The corresponding schemas are made available on +> [https://crim-ca.github.io/mlm-extension/](https://crim-ca.github.io/mlm-extension/). + - **Title:** ML Model - **Identifier:** - **Field Name Prefix:** ml-model - **Scope:** Item, Collection -- **Extension [Maturity Classification](https://github.com/radiantearth/stac-spec/tree/master/extensions/README.md#extension-maturity):** Proposal +- **Extension [Maturity Classification](https://github.com/radiantearth/stac-spec/tree/master/extensions/README.md#extension-maturity):** Deprecated - **Owner**: @duckontheweb This document explains the ML Model Extension to the [SpatioTemporal Asset From 0989894271a2c904aef4f256bb8d5fd1286e1075 Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Wed, 7 Aug 2024 14:43:26 -0700 Subject: [PATCH 2/8] draft migration path doc from ml-model to MLM --- MIGRATION_TO_MLM.md | 94 +++++++++++++++++++++++++++++++++++++++++++++ README.md | 15 ++++---- 2 files changed, 102 insertions(+), 7 deletions(-) create mode 100644 MIGRATION_TO_MLM.md diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md new file mode 100644 index 0000000..ff533aa --- /dev/null +++ b/MIGRATION_TO_MLM.md @@ -0,0 +1,94 @@ +# Migration Guide: ML Model Extension to MLM Extension + +## Context + +The ML Model Extension was started at Radiant Earth on October 4th, 2021. It was possibly the first STAC extension dedicated to describing machine learning models. The extension incorporated inputs from 9 different organizations and was used to describe models in Radiant Earth's MLHub API. The announcement of this extension and its use in Radiant Earth's MLHub is described [here](https://medium.com/radiant-earth-insights/geospatial-models-now-available-in-radiant-mlhub-a41eb795d7d7). Radiant Earth's MLHub API and Python SDK are now [deprecated](https://mlhub.earth/?gad_source=1&gclid=CjwKCAjwk8e1BhALEiwAc8MHiBZ1JcpErgQXlna7FsB3dd-mlPpMF-jpLQJolBgtYLDOeH2k-cxxLRoCEqQQAvD_BwE). In order to support other current users of the ML Model extension, this document lays out a migration path to convert metadata to the Machine Learning Model Extension (MLM). + +## Shared Goals + +Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with Earth observation (EO) data. Their main goals are: + +1. **Search and Discovery**: Helping users find and use ML models. +2. **Describing Inference Requirements**: Making it easier to run these models by describing input requirements and outputs. +3. **Reproducibility**: Providing runtime information and links to assets so that model inference is reproducible. + +## Schema Changes + +### ML Model Extension +- **Scope**: Item, Collection +- **Field Name Prefix**: `ml-model` +- **Key Sections**: + - Item Properties + - Asset Objects + - Inference/Training Runtimes + - Relation Types + - Interpretation of STAC Fields + +### MLM Extension +- **Scope**: Collection, Item, Asset, Links +- **Field Name Prefix**: `mlm` +- **Key Sections**: + - Item Properties and Collection Fields + - Asset Objects + - Relation Types + - Model Input/Output Objects + - Best Practices + +Notable differences: + +- The MLM Extension covers more details at both the Item and Asset levels, making it easier to describe and use model metadata. +- The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles. +- The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/crim-ca/mlm-extension/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples. + +## Changes in Field Names + +### Item Properties + +| ML Model Extension | MLM Extension | Notes | +| ---------------------------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `ml-model:type` | N/A | No direct equivalent, it is implied by the `mlm` prefix in MLM fields and directly specified by the schema identifier. | +| `ml-model:learning_approach` | `mlm:tasks` | Removed in favor of specifying specific `mlm:tasks`. | +| `ml-model:prediction_type` | `mlm:tasks` | `mlm:tasks` provides a more comprehensive enum of prediction types. | +| `ml-model:architecture` | `mlm:architecture` | The MLM provides specific guidance on using Papers With Code - Computer Vision identifiers for model architectures. No guidance is provided in ML Model. | +| `ml-model:training-processor-type` | `mlm:accelerator` | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference (the focus of the MLM extension is inference). ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. | +| `ml-model:training-os` | N/A | This field is no longer recommended in the MLM for training or inference; instead, users can specify an optional `mlm:training-runtime` asset. | + + +### New Fields in MLM + +- **`mlm:name`**: A required name for the model. +- **`mlm:framework`**: The framework used to train the model. +- **`mlm:framework_version`**: The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. +- **`mlm:memory_size`**: The in-memory size of the model. +- **`mlm:total_parameters`**: Total number of model parameters. +- **`mlm:pretrained`**: Indicates if the model is derived from a pretrained model. +- **`mlm:pretrained_source`**: Source of the pretrained model by name or URL if it is less well known. +- **`mlm:batch_size_suggestion`**: Suggested batch size for the given accelerator. +- **`mlm:accelerator_constrained`**: Indicates if the model requires a specific accelerator. +- **`mlm:accelerator_summary`**: Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. +- **`mlm:accelerator_count`**: Minimum number of accelerator instances required. +- **`mlm:input`**: Describes the model's input shape, dtype, and normalization and resize transformations. +- **`mlm:output`**: Describes the model's output shape and dtype. +- **`mlm:hyperparameters`**: Additional hyperparameters relevant to the model. + +### Asset Objects + +| ML Model Extension Role | MLM Extension Role | Notes | +| ---------------------------- | ----------------------- | -------------------------------------------------------------------------------------------------- | +| `ml-model:inference-runtime` | `mlm:inference-runtime` | Direct conversion; same role and function. | +| `ml-model:training-runtime` | `mlm:training-runtime` | Direct conversion; same role and function. | +| `ml-model:checkpoint` | `mlm:checkpoint` | Direct conversion; same role and function. | +| N/A | `mlm:model` | New required role for model assets in MLM. This represents the asset that is loaded for inference. | +| N/A | `mlm:source_code` | Recommended for providing source code details. | +| N/A | `mlm:container` | Recommended for containerized environments. | +| N/A | `mlm:training` | Recommended for training pipelines. | +| N/A | `mlm:inference` | Recommended for inference pipelines. | + + +The MLM is focused on search, discovery descriptions, and reproducibility of inference. Nevertheless, the MLM provides a recommended asset role for `mlm:training-runtime` and asset `mlm:training`, which can point to a container URL that has the training runtime requirements. The ML Model extension specifies a field for `ml-model:training-runtime` but like `mlm:training` it only contains the default STAC Asset fields and additional fields specified by the Container Asset. Training requirements typically differ from inference requirements so therefore we recommend that fields and assets for reproducing model training or fine-tuning models be contained in a separate STAC extension. + +## Getting Help + +If you have any questions about a migration, feel free to contact the maintainers by opening a discussion or issue on the [MLM repository](https://github.com/crim-ca/mlm-extension). + +If you see a feature missing in the MLM, feel free to open an issue describing your feature request. \ No newline at end of file diff --git a/README.md b/README.md index 79544d4..60bc3fc 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ > [https://github.com/crim-ca/mlm-extension](https://github.com/crim-ca/mlm-extension).
> The corresponding schemas are made available on > [https://crim-ca.github.io/mlm-extension/](https://crim-ca.github.io/mlm-extension/). +> Documentation on migrating from the Ml Model Extension to the Machine Learning Model Extension (MLM) is [here](./MIGRATION_TO_MLM.md). - **Title:** ML Model - **Identifier:** @@ -14,7 +15,7 @@ - **Owner**: @duckontheweb This document explains the ML Model Extension to the [SpatioTemporal Asset -Catalog](https://github.com/radiantearth/stac-spec) (STAC) specification. +Catalog](https://github.com/radiantearth/stac-spec) (STAC) specification. - Examples: - [Item example](examples/dummy/item.json): Shows the basic usage of the extension in a STAC Item @@ -55,7 +56,7 @@ these models for the following types of use-cases: institutions are making an effort to publish code and examples along with academic publications to enable this kind of reproducibility. However, the quality and usability of this code and related documentation can vary widely and there are currently no standards that ensure that a new researcher could reproduce a given set of published results from the documentation. The STAC ML Model Extension aims to address this issue by - providing a detailed description of the training data and environment used in a ML model experiment. + providing a detailed description of the training data and environment used in a ML model experiment. ## Item Properties @@ -72,7 +73,7 @@ these models for the following types of use-cases: #### ml-model:learning_approach -Describes the learning approach used to train the model. It is STRONGLY RECOMMENDED that you use one of the +Describes the learning approach used to train the model. It is STRONGLY RECOMMENDED that you use one of the following values, but other values are allowed. - `"supervised"` @@ -82,7 +83,7 @@ following values, but other values are allowed. #### ml-model:prediction_type -Describes the type of predictions made by the model. It is STRONGLY RECOMMENDED that you use one of the +Describes the type of predictions made by the model. It is STRONGLY RECOMMENDED that you use one of the following values, but other values are allowed. Note that not all Prediction Type values are valid for a given [Learning Approach](#ml-modellearning_approach). @@ -126,7 +127,7 @@ While the Compose file defines nearly all of the parameters required to run the directory containing input data should be mounted to the container and to which host directory the output predictions should be written. The Compose file MUST define volume mounts for input and output data using the Compose [Interpolation syntax](https://github.com/compose-spec/compose-spec/blob/master/spec.md#interpolation). The input data volume MUST be defined by an -`INPUT_DATA` variable and the output data volume MUST be defined by an `OUTPUT_DATA` variable. +`INPUT_DATA` variable and the output data volume MUST be defined by an `OUTPUT_DATA` variable. For example, the following Compose file snippet would mount the host input directory to `/var/data/input` in the container and would mount the host output data directory to `/var/data/output` in the host container. In this contrived example, the script to run the model takes 2 arguments: the @@ -214,10 +215,10 @@ extension, please open a PR to include it in the `examples` directory. Here are ### Running tests -The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid. +The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid. To run tests locally, you'll need `npm`, which is a standard part of any [node.js installation](https://nodejs.org/en/download/). -First you'll need to install everything with npm once. Just navigate to the root of this repository and on +First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run: ```bash npm install From 6b8dadd25a925388a97daab28473d3f618a69ac0 Mon Sep 17 00:00:00 2001 From: Francis Charette-Migneault Date: Fri, 27 Sep 2024 11:38:42 -0400 Subject: [PATCH 3/8] Update README with deprecation and MLM redirect See https://github.com/stac-extensions/mlm and https://github.com/orgs/stac-utils/discussions/4#discussioncomment-10767685 --- README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 79544d4..e128cee 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,15 @@ # ML Model Extension Specification -> :warning:
+> [!WARNING] > This repository is deprecated in favor of -> [https://github.com/crim-ca/mlm-extension](https://github.com/crim-ca/mlm-extension).
+> [https://github.com/stac-extensions/mlm](https://github.com/stac-extensions/mlm).
> The corresponding schemas are made available on -> [https://crim-ca.github.io/mlm-extension/](https://crim-ca.github.io/mlm-extension/). +> [https://stac-extensions.github.io/mlm/](https://stac-extensions.github.io/mlm/). +> +> It is **STRONGLY** recommended to migrate `ml-model` definitions to the `mlm` extension. +> The `mlm` extension improves the model metadata definition and properties with added support for use cases not directly supported by `ml-model`. +> It also provides increased interroperability with other STAC extensions, adds best-practices recommendations, provides tooling for creating +> STAC attributes, and works toward alignement efforts from both geospatial and machine learning communities. - **Title:** ML Model - **Identifier:** From cd8ab8c99cbd824ce3d7657987511833cf182367 Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Mon, 4 Nov 2024 08:08:50 -0800 Subject: [PATCH 4/8] address feedback --- MIGRATION_TO_MLM.md | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md index ff533aa..a1b8d5a 100644 --- a/MIGRATION_TO_MLM.md +++ b/MIGRATION_TO_MLM.md @@ -6,10 +6,10 @@ The ML Model Extension was started at Radiant Earth on October 4th, 2021. It was ## Shared Goals -Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with Earth observation (EO) data. Their main goals are: +Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with, but are not limited to, Earth observation (EO) data. Their main goals are: 1. **Search and Discovery**: Helping users find and use ML models. -2. **Describing Inference Requirements**: Making it easier to run these models by describing input requirements and outputs. +2. **Describing Inference and Training Requirements**: Making it easier to run these models by describing input requirements and outputs. 3. **Reproducibility**: Providing runtime information and links to assets so that model inference is reproducible. ## Schema Changes @@ -39,6 +39,7 @@ Notable differences: - The MLM Extension covers more details at both the Item and Asset levels, making it easier to describe and use model metadata. - The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles. - The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/crim-ca/mlm-extension/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples. +- MLM is easier to maintain and enhance in a fast moving ML ecosystem thanks to it's use of pydantic models, while still being compatible with pystac for extension and STAc core validation. ## Changes in Field Names @@ -49,27 +50,29 @@ Notable differences: | `ml-model:type` | N/A | No direct equivalent, it is implied by the `mlm` prefix in MLM fields and directly specified by the schema identifier. | | `ml-model:learning_approach` | `mlm:tasks` | Removed in favor of specifying specific `mlm:tasks`. | | `ml-model:prediction_type` | `mlm:tasks` | `mlm:tasks` provides a more comprehensive enum of prediction types. | -| `ml-model:architecture` | `mlm:architecture` | The MLM provides specific guidance on using Papers With Code - Computer Vision identifiers for model architectures. No guidance is provided in ML Model. | -| `ml-model:training-processor-type` | `mlm:accelerator` | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference (the focus of the MLM extension is inference). ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. | +| `ml-model:architecture` | `mlm:architecture` | The MLM provides specific guidance on using *Papers With Code - Computer Vision* identifiers for model architectures. No guidance is provided in ML Model. | +| `ml-model:training-processor-type` | `mlm:accelerator` | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference. ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. | | `ml-model:training-os` | N/A | This field is no longer recommended in the MLM for training or inference; instead, users can specify an optional `mlm:training-runtime` asset. | ### New Fields in MLM -- **`mlm:name`**: A required name for the model. -- **`mlm:framework`**: The framework used to train the model. -- **`mlm:framework_version`**: The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. -- **`mlm:memory_size`**: The in-memory size of the model. -- **`mlm:total_parameters`**: Total number of model parameters. -- **`mlm:pretrained`**: Indicates if the model is derived from a pretrained model. -- **`mlm:pretrained_source`**: Source of the pretrained model by name or URL if it is less well known. -- **`mlm:batch_size_suggestion`**: Suggested batch size for the given accelerator. -- **`mlm:accelerator_constrained`**: Indicates if the model requires a specific accelerator. -- **`mlm:accelerator_summary`**: Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. -- **`mlm:accelerator_count`**: Minimum number of accelerator instances required. -- **`mlm:input`**: Describes the model's input shape, dtype, and normalization and resize transformations. -- **`mlm:output`**: Describes the model's output shape and dtype. -- **`mlm:hyperparameters`**: Additional hyperparameters relevant to the model. +| Field Name | Description | +|----------------------------------|-------------------------------------------------------------------------------------------------------------------------| +| **`mlm:name`** | A required name for the model. | +| **`mlm:framework`** | The framework used to train the model. | +| **`mlm:framework_version`** | The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. | +| **`mlm:memory_size`** | The in-memory size of the model. | +| **`mlm:total_parameters`** | Total number of model parameters. | +| **`mlm:pretrained`** | Indicates if the model is derived from a pretrained model. | +| **`mlm:pretrained_source`** | Source of the pretrained model by name or URL if it is less well known. | +| **`mlm:batch_size_suggestion`** | Suggested batch size for the given accelerator. | +| **`mlm:accelerator_constrained`**| Indicates if the model requires a specific accelerator. | +| **`mlm:accelerator_summary`** | Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | +| **`mlm:accelerator_count`** | Minimum number of accelerator instances required. | +| **`mlm:input`** | Describes the model's input shape, dtype, and normalization and resize transformations. | +| **`mlm:output`** | Describes the model's output shape and dtype. | +| **`mlm:hyperparameters`** | Additional hyperparameters relevant to the model. | ### Asset Objects From 4d9c92e203140787cf68871b478e2d054dc72478 Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Mon, 4 Nov 2024 09:18:38 -0800 Subject: [PATCH 5/8] address remaining feedback --- MIGRATION_TO_MLM.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md index a1b8d5a..ce99551 100644 --- a/MIGRATION_TO_MLM.md +++ b/MIGRATION_TO_MLM.md @@ -67,6 +67,7 @@ Notable differences: | **`mlm:pretrained`** | Indicates if the model is derived from a pretrained model. | | **`mlm:pretrained_source`** | Source of the pretrained model by name or URL if it is less well known. | | **`mlm:batch_size_suggestion`** | Suggested batch size for the given accelerator. | +| **`mlm:accelerator`**| Indicates the specific accelerator recommended for the model. | | **`mlm:accelerator_constrained`**| Indicates if the model requires a specific accelerator. | | **`mlm:accelerator_summary`** | Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | | **`mlm:accelerator_count`** | Minimum number of accelerator instances required. | @@ -81,14 +82,14 @@ Notable differences: | `ml-model:inference-runtime` | `mlm:inference-runtime` | Direct conversion; same role and function. | | `ml-model:training-runtime` | `mlm:training-runtime` | Direct conversion; same role and function. | | `ml-model:checkpoint` | `mlm:checkpoint` | Direct conversion; same role and function. | -| N/A | `mlm:model` | New required role for model assets in MLM. This represents the asset that is loaded for inference. | +| N/A | `mlm:model` | New required role for model assets in MLM. This represents the asset that is the source of model weights and definition. | | N/A | `mlm:source_code` | Recommended for providing source code details. | | N/A | `mlm:container` | Recommended for containerized environments. | -| N/A | `mlm:training` | Recommended for training pipelines. | -| N/A | `mlm:inference` | Recommended for inference pipelines. | +| N/A | `mlm:training` | Recommended for training pipeline assets. | +| N/A | `mlm:inference` | Recommended for inference pipeline assets. | -The MLM is focused on search, discovery descriptions, and reproducibility of inference. Nevertheless, the MLM provides a recommended asset role for `mlm:training-runtime` and asset `mlm:training`, which can point to a container URL that has the training runtime requirements. The ML Model extension specifies a field for `ml-model:training-runtime` but like `mlm:training` it only contains the default STAC Asset fields and additional fields specified by the Container Asset. Training requirements typically differ from inference requirements so therefore we recommend that fields and assets for reproducing model training or fine-tuning models be contained in a separate STAC extension. +The MLM provides a recommended asset role for `mlm:training-runtime` and asset `mlm:training`, which can point to a container URL that has the training runtime requirements. The ML Model extension specifies a field for `ml-model:training-runtime` and like `mlm:training` it only contains the default STAC Asset fields and a few additional fields specified by the Container Asset. Training requirements typically differ from inference requirements which is why there are two separate Container assets in both extensions. ## Getting Help From 1ec663ed49c6511202ae0bd2dea0598d3d6b12e8 Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Mon, 4 Nov 2024 16:10:21 -0800 Subject: [PATCH 6/8] correct links --- MIGRATION_TO_MLM.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md index ce99551..0a079b8 100644 --- a/MIGRATION_TO_MLM.md +++ b/MIGRATION_TO_MLM.md @@ -37,8 +37,8 @@ Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim t Notable differences: - The MLM Extension covers more details at both the Item and Asset levels, making it easier to describe and use model metadata. -- The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles. -- The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/crim-ca/mlm-extension/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples. +- The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/stac-extensions/mlm?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles. +- The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/stac-extensions/mlm/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples. - MLM is easier to maintain and enhance in a fast moving ML ecosystem thanks to it's use of pydantic models, while still being compatible with pystac for extension and STAc core validation. ## Changes in Field Names @@ -93,6 +93,6 @@ The MLM provides a recommended asset role for `mlm:training-runtime` and asset ` ## Getting Help -If you have any questions about a migration, feel free to contact the maintainers by opening a discussion or issue on the [MLM repository](https://github.com/crim-ca/mlm-extension). +If you have any questions about a migration, feel free to contact the maintainers by opening a discussion or issue on the [MLM repository](https://github.com/stac-extensions/mlm). If you see a feature missing in the MLM, feel free to open an issue describing your feature request. \ No newline at end of file From ad76ae076659cb9d17c3e9742341282eed0d9769 Mon Sep 17 00:00:00 2001 From: Francis Charette Migneault Date: Thu, 26 Jun 2025 12:25:10 -0400 Subject: [PATCH 7/8] update migration references --- MIGRATION_TO_MLM.md | 82 +++++++++++++-------------------------------- 1 file changed, 24 insertions(+), 58 deletions(-) diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md index 0a079b8..d23d23a 100644 --- a/MIGRATION_TO_MLM.md +++ b/MIGRATION_TO_MLM.md @@ -1,12 +1,26 @@ # Migration Guide: ML Model Extension to MLM Extension +>[!IMPORTANT] +> For specific migration details from [ML-Model](README.md) to [Machine Learning Model (MLM)][mlm] +> please refer to the [Migration Document](https://github.com/stac-extensions/mlm/blob/main/docs/legacy/ml-model.md). + ## Context -The ML Model Extension was started at Radiant Earth on October 4th, 2021. It was possibly the first STAC extension dedicated to describing machine learning models. The extension incorporated inputs from 9 different organizations and was used to describe models in Radiant Earth's MLHub API. The announcement of this extension and its use in Radiant Earth's MLHub is described [here](https://medium.com/radiant-earth-insights/geospatial-models-now-available-in-radiant-mlhub-a41eb795d7d7). Radiant Earth's MLHub API and Python SDK are now [deprecated](https://mlhub.earth/?gad_source=1&gclid=CjwKCAjwk8e1BhALEiwAc8MHiBZ1JcpErgQXlna7FsB3dd-mlPpMF-jpLQJolBgtYLDOeH2k-cxxLRoCEqQQAvD_BwE). In order to support other current users of the ML Model extension, this document lays out a migration path to convert metadata to the Machine Learning Model Extension (MLM). +The ML Model Extension was started at Radiant Earth on October 4th, 2021. +It was possibly the first STAC extension dedicated to describing machine learning models. +The extension incorporated inputs from 9 different organizations and was used to describe models +in Radiant Earth's MLHub API. The announcement of this extension and its use in Radiant Earth's MLHub +is described [here](https://medium.com/radiant-earth-insights/geospatial-models-now-available-in-radiant-mlhub-a41eb795d7d7). +Radiant Earth's MLHub API and Python SDK are now [deprecated](https://mlhub.earth/?gad_source=1&gclid=CjwKCAjwk8e1BhALEiwAc8MHiBZ1JcpErgQXlna7FsB3dd-mlPpMF-jpLQJolBgtYLDOeH2k-cxxLRoCEqQQAvD_BwE). +In order to support other current users of the ML Model extension, this document lays out a migration path to convert +metadata to the [Machine Learning Model Extension (MLM)][mlm]. ## Shared Goals -Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with, but are not limited to, Earth observation (EO) data. Their main goals are: +Both the ML Model Extension and the [Machine Learning Model (MLM)][mlm] extension aim to provide a standard way to +catalog machine learning (ML) models that work with, but are not limited to, Earth observation (EO) data. + +Their main goals are: 1. **Search and Discovery**: Helping users find and use ML models. 2. **Describing Inference and Training Requirements**: Making it easier to run these models by describing input requirements and outputs. @@ -34,65 +48,17 @@ Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim t - Model Input/Output Objects - Best Practices -Notable differences: +### Notable Differences - The MLM Extension covers more details at both the Item and Asset levels, making it easier to describe and use model metadata. -- The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/stac-extensions/mlm?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles. -- The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/stac-extensions/mlm/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples. -- MLM is easier to maintain and enhance in a fast moving ML ecosystem thanks to it's use of pydantic models, while still being compatible with pystac for extension and STAc core validation. - -## Changes in Field Names - -### Item Properties - -| ML Model Extension | MLM Extension | Notes | -| ---------------------------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `ml-model:type` | N/A | No direct equivalent, it is implied by the `mlm` prefix in MLM fields and directly specified by the schema identifier. | -| `ml-model:learning_approach` | `mlm:tasks` | Removed in favor of specifying specific `mlm:tasks`. | -| `ml-model:prediction_type` | `mlm:tasks` | `mlm:tasks` provides a more comprehensive enum of prediction types. | -| `ml-model:architecture` | `mlm:architecture` | The MLM provides specific guidance on using *Papers With Code - Computer Vision* identifiers for model architectures. No guidance is provided in ML Model. | -| `ml-model:training-processor-type` | `mlm:accelerator` | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference. ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. | -| `ml-model:training-os` | N/A | This field is no longer recommended in the MLM for training or inference; instead, users can specify an optional `mlm:training-runtime` asset. | - - -### New Fields in MLM - -| Field Name | Description | -|----------------------------------|-------------------------------------------------------------------------------------------------------------------------| -| **`mlm:name`** | A required name for the model. | -| **`mlm:framework`** | The framework used to train the model. | -| **`mlm:framework_version`** | The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. | -| **`mlm:memory_size`** | The in-memory size of the model. | -| **`mlm:total_parameters`** | Total number of model parameters. | -| **`mlm:pretrained`** | Indicates if the model is derived from a pretrained model. | -| **`mlm:pretrained_source`** | Source of the pretrained model by name or URL if it is less well known. | -| **`mlm:batch_size_suggestion`** | Suggested batch size for the given accelerator. | -| **`mlm:accelerator`**| Indicates the specific accelerator recommended for the model. | -| **`mlm:accelerator_constrained`**| Indicates if the model requires a specific accelerator. | -| **`mlm:accelerator_summary`** | Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | -| **`mlm:accelerator_count`** | Minimum number of accelerator instances required. | -| **`mlm:input`** | Describes the model's input shape, dtype, and normalization and resize transformations. | -| **`mlm:output`** | Describes the model's output shape and dtype. | -| **`mlm:hyperparameters`** | Additional hyperparameters relevant to the model. | - -### Asset Objects - -| ML Model Extension Role | MLM Extension Role | Notes | -| ---------------------------- | ----------------------- | -------------------------------------------------------------------------------------------------- | -| `ml-model:inference-runtime` | `mlm:inference-runtime` | Direct conversion; same role and function. | -| `ml-model:training-runtime` | `mlm:training-runtime` | Direct conversion; same role and function. | -| `ml-model:checkpoint` | `mlm:checkpoint` | Direct conversion; same role and function. | -| N/A | `mlm:model` | New required role for model assets in MLM. This represents the asset that is the source of model weights and definition. | -| N/A | `mlm:source_code` | Recommended for providing source code details. | -| N/A | `mlm:container` | Recommended for containerized environments. | -| N/A | `mlm:training` | Recommended for training pipeline assets. | -| N/A | `mlm:inference` | Recommended for inference pipeline assets. | - - -The MLM provides a recommended asset role for `mlm:training-runtime` and asset `mlm:training`, which can point to a container URL that has the training runtime requirements. The ML Model extension specifies a field for `ml-model:training-runtime` and like `mlm:training` it only contains the default STAC Asset fields and a few additional fields specified by the Container Asset. Training requirements typically differ from inference requirements which is why there are two separate Container assets in both extensions. +- The MLM Extension covers more runtime requirements using distinct asset roles. +- The MLM extension has better integration with the STAC Extensions and Python ecosystem. ## Getting Help -If you have any questions about a migration, feel free to contact the maintainers by opening a discussion or issue on the [MLM repository](https://github.com/stac-extensions/mlm). +If you have any questions about a migration, feel free to contact the maintainers by opening a discussion or issue +on the [MLM repository][mlm]. + +If you see a feature missing in the MLM, feel free to open an issue describing your feature request. -If you see a feature missing in the MLM, feel free to open an issue describing your feature request. \ No newline at end of file +[mlm]: https://github.com/stac-extensions/mlm From 935840c1578322607c2a625f18f6b615070deff0 Mon Sep 17 00:00:00 2001 From: Francis Charette Migneault Date: Thu, 26 Jun 2025 12:33:37 -0400 Subject: [PATCH 8/8] update & fix markdown --- MIGRATION_TO_MLM.md | 8 ++++++-- README.md | 16 ++++++++++++---- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md index d23d23a..b3237e6 100644 --- a/MIGRATION_TO_MLM.md +++ b/MIGRATION_TO_MLM.md @@ -1,8 +1,12 @@ # Migration Guide: ML Model Extension to MLM Extension + + >[!IMPORTANT] -> For specific migration details from [ML-Model](README.md) to [Machine Learning Model (MLM)][mlm] -> please refer to the [Migration Document](https://github.com/stac-extensions/mlm/blob/main/docs/legacy/ml-model.md). +> For specific field migration details from [ML-Model](README.md) to [Machine Learning Model (MLM)][mlm] please refer +> to the [MLM Migration Document](https://github.com/stac-extensions/mlm/blob/main/docs/legacy/ml-model.md). + + ## Context diff --git a/README.md b/README.md index c9bf3b0..da0ccf6 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,24 @@ # ML Model Extension Specification + + > [!WARNING] > This repository is deprecated in favor of > [https://github.com/stac-extensions/mlm](https://github.com/stac-extensions/mlm).
> The corresponding schemas are made available on > [https://stac-extensions.github.io/mlm/](https://stac-extensions.github.io/mlm/). -> Documentation on migrating from the Ml Model Extension to the Machine Learning Model Extension (MLM) is [here](./MIGRATION_TO_MLM.md). +> Documentation on migrating from the ML-Model extension to the Machine Learning Model (MLM) extension +> is [here](./MIGRATION_TO_MLM.md). Further details are also available in the +> [MLM Migration Document](https://github.com/stac-extensions/mlm/blob/main/docs/legacy/ml-model.md) > > It is **STRONGLY** recommended to migrate `ml-model` definitions to the `mlm` extension. -> The `mlm` extension improves the model metadata definition and properties with added support for use cases not directly supported by `ml-model`. -> It also provides increased interroperability with other STAC extensions, adds best-practices recommendations, provides tooling for creating -> STAC attributes, and works toward alignement efforts from both geospatial and machine learning communities. +> The `mlm` extension improves the model metadata definition and properties with added support +> for use cases not directly supported by `ml-model`. +> It also provides increased interoperability with other STAC extensions, adds best-practices recommendations, +> provides tooling for creating STAC attributes, and works toward alignement efforts from both geospatial and +> machine learning communities. + + - **Title:** ML Model - **Identifier:**