Add CLI for converting v2 metadata to v3 #3257

K-Meech · 2025-07-16T15:10:16Z

Adds a CLI using typer to convert v2 metadata (.zarray / .zattrs...) to v3 metadata zarr.json.

To test, you will need to install the new optional cli dependency e.g.
pip install -e ".[remote,cli]"

This should make the zarr-converter command available e.g. try:

zarr-converter --help
zarr-converter convert --help
zarr-converter clear --help

convert adds zarr.json files to every group / array, leaving the v2 metadata as-is. A zarr with both sets of metadata can still be opened with zarr.open, but will give a UserWarning: Both zarr.json (Zarr format 3) and .zarray (Zarr format 2) metadata objects exist... Zarr v3 will be used.. This can be avoided by passing zarr_format=3 to zarr.open, or by using the clear command to remove the v2 metadata.

clear can also remove v3 metadata. This is useful if the conversion fails part way through e.g. if one of the arrays uses a codec with no v3 equivalent.

All code for the cli is in src/zarr/core/metadata/converter/cli.py, with the actual conversion functions in src/zarr/core/metadata/converter/converter_v2_v3.py. These functions can be called directly, for those who don't want to use the CLI (although currently they are part of /core which is considered private API, so it may be best to move them elsewhere in the package).

Some points to consider:

I had to modify set_path from test_dtype_registry.py and test_codec_entrypoints.py, as they were causing the CLI tests to fail if they were run after. This seems to be due to the lazy_load_list of the numcodecs codecs registries being cleared, meaning they were no longer available in my code which finds the numcodecs.zarr3 equivalent of a numcodecs codec.
I tested this on local zarr images, so it would be great if someone with access to s3 / google cloud etc., could try it out on some small example images there.
I'm happy to add docs about how to use the CLI, but wanted to get feedback on the general structure first

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

…onversion

…sting a zarr version greater than 3

Merge changes from review

K-Meech · 2025-07-28T10:31:28Z

I've updated the structure of the CLI - hopefully this addresses both @dstansby and @d-v-b 's comments! You should be able to test with:

zarr --help
zarr migrate --help
zarr remove-metadata --help

I haven't addressed this comment yet, but will do so in the next round of changes (I know @dstansby had some additional comments to make on the converter implementation).

One known issue:

If you run zarr migrate --dry-run with a non-existent location e.g. zarr migrate v3 data/images/example-1.zarr --dry-run, it will create an empty directory at that location. This is related to r+ opens - reported on this issue: Using r+ mode with zarr.open creates an empty directory #3295

dstansby · 2025-07-28T12:46:34Z

Very nice! I had a play with it locally and it worked well. I'll do a fuller review of the code now.

dstansby

Looking great! I've reviewed the implementation, and left some comments; I'll move on to reviewing the tests next, but thought I'd post these comments first.

src/zarr/core/metadata/converter/cli.py

dstansby · 2025-07-28T13:02:10Z

src/zarr/core/metadata/converter/cli.py

+    else:
+        lvl = logging.WARNING
+    fmt = "%(message)s"
+    logging.basicConfig(level=lvl, format=fmt)


Suggested change

logging.basicConfig(level=lvl, format=fmt)

logger.basicConfig(level=lvl, format=fmt)

I think you want to configure the logger instance, not global settings?

I could do something like:

if verbose: logger.setLevel(logging.INFO) else: logger.setLevel(logging.WARNING) logger.addHandler(logging.StreamHandler())

but the issue is this will only affect logs directly from the cli.py file. When using --dry-run, I also want to see the log of created / deleted files from migrate_to_v3.py, which won't be shown with this setting alone.

I could add a similar setup to the migrate_to_v3.py file, but this could be annoying for downstream code that uses these functions and wants to use a different logging level / setup e.g. to a file rather than the console.

It could work if I created the migrate_to_v3.py logger as an explicit child of the cli.py logger though i.e.

# in cli.py logger = logging.getLogger("cli") # in migrate_to_v3.py logger = logging.getLogger("cli.migrate")

What do you think?

dstansby · 2025-07-28T13:02:35Z

src/zarr/core/metadata/converter/cli.py

+
+
+def _set_verbose_level() -> None:
+    logging.getLogger().setLevel(logging.INFO)


Suggested change

logging.getLogger().setLevel(logging.INFO)

logger.setLevel(logging.INFO)

Same reason as above

dstansby · 2025-07-28T13:09:58Z

src/zarr/core/metadata/converter/cli.py

+        str | None,
+        typer.Argument(
+            help=(
+                "Output location to write generated metadata (no chunks will be copied). If not provided, "


Suggested change

"Output location to write generated metadata (no chunks will be copied). If not provided, "

"Output location to write generated metadata (no array data will be copied). If not provided, "

src/zarr/core/metadata/converter/cli.py

dstansby · 2025-07-28T13:28:53Z