update: mat3ra-2d 2026-06 review response nb by VsevolodX · Pull Request #338 · mat3ra/api-examples

VsevolodX · 2026-06-26T16:05:59Z

Summary by CodeRabbit

New Features
- Added a new interactive notebook demonstrating dataset download and local caching, structure loading, and manifest-based checks.
- Showcases structural diversity stats, conversions to common structure formats, and key slab properties.
- Includes provenance examples that display interface, slab, and embedded bulk metadata relationships.
- Adds optional examples for grain-boundary entries and batch summaries across multiple structure types.

review-notebook-app · 2026-06-26T16:06:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

coderabbitai · 2026-06-26T16:06:13Z

Warning

Review limit reached

@VsevolodX, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 49 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3e9820c-f4d9-456f-899a-62f6154e73d3

📥 Commits

Reviewing files that changed from the base of the PR and between 52199f4 and 34843d3.

📒 Files selected for processing (1)

other/mat3rials_explorer/provenance_and_interoperability_demo.ipynb

📝 Walkthrough

Walkthrough

Adds a notebook that downloads the Mat3rials Explorer dataset with local caching, loads its manifest, computes dataset diversity stats, converts structures to pymatgen and ASE, and prints slab, provenance, grain-boundary, and multi-structure summaries.

Changes

Material provenance and interoperability demo

Layer / File(s)	Summary
Dataset fetch and manifest `examples/material/provenance_and_interoperability_demo.ipynb`	Resolves the Figshare article id, downloads the dataset into a local cache, validates `manifest.yaml`, and loads the manifest.
Structural diversity stats `examples/material/provenance_and_interoperability_demo.ipynb`	Counts form factors and dimensionalities from `manifest["sources"]` and prints totals and unique formula counts.
Conversion helpers and slab example `examples/material/provenance_and_interoperability_demo.ipynb`	Installs ASE and pymatgen if needed, defines Mat3ra JSON converters, and runs a slab conversion example through both adapters.
Provenance chain and grain boundary `examples/material/provenance_and_interoperability_demo.ipynb`	Extracts build configuration data, prints interface and slab provenance fields, and reports a grain-boundary entry's metadata keys.
Structure sweep and wrap-up `examples/material/provenance_and_interoperability_demo.ipynb`	Iterates across selected structure types, prints pymatgen summaries, and ends with notebook statistics and metadata.

Sequence Diagram(s)

sequenceDiagram
  participant Notebook as Notebook cell
  participant Cache as dataset_is_cached
  participant Download as download_figshare_article
  participant Pymatgen as mat3ra_to_pymatgen
  participant ASE as mat3ra_to_ase
  participant Provenance as get_interface_configuration

  Notebook->>Cache: check data_dir against manifest.yaml
  Cache-->>Notebook: cache hit or miss
  Notebook->>Download: fetch the Figshare article when needed
  Notebook->>Pymatgen: convert Mat3ra JSON to Structure
  Notebook->>ASE: convert Mat3ra JSON to Atoms
  Notebook->>Provenance: extract InterfaceConfiguration metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through Figshare, swift and bright,
With slab and cache and manifest in sight.
I nibbled provenance, hop by hop,
And watched the structure stories pop.
Pymatgen, ASE, now dance as one—
A bunny’s notebook hop is done.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is only loosely related to the notebook and reads more like a work note than a clear summary of the change.	Rename it to describe the new notebook, e.g. "Add provenance and interoperability demo notebook for Mat3rials Explorer".

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/mat3rials-explorer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

examples/material/provenance_and_interoperability_demo.ipynb (1)

304-304: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Summarize symbols instead of printing every atom.

For large slabs this bloats the notebook output and makes reviews/renders noisy.

Suggested output reduction

-    print(f"  Chemical symbols: {ase_atoms.get_chemical_symbols()}")
+    print(f"  Chemical symbols: {dict(Counter(ase_atoms.get_chemical_symbols()))}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` at line 304,
The notebook cell currently prints every entry from
ase_atoms.get_chemical_symbols(), which creates noisy output for large slabs.
Update the nearby print logic in the demo cell to summarize the composition
instead of listing all atoms, using a concise aggregate derived from
ase_atoms.get_chemical_symbols() so the output stays compact and readable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/material/provenance_and_interoperability_demo.ipynb`:
- Around line 21-28: The notebook cell contains stale execution output that no
longer matches the current source and includes a machine-specific temp path.
Clear the saved outputs or rerun the notebook end-to-end so the outputs in
provenance_and_interoperability_demo.ipynb reflect the current behavior,
especially the cell that prints the Mat3rials Explorer download messages.
- Around line 179-196: The notebook install cells for ASE and pymatgen use
unpinned !pip installs that can drift from the versions declared in
pyproject.toml and may target the wrong interpreter. Update the import fallback
logic in the notebook cell that handles ase and pymatgen installation to use
%pip (or install the project dependencies directly) and pin the versions to
match the repository’s declared compatibility, keeping the existing import flow
in the same cell.
- Around line 537-542: The demo in the notebook is using a hardcoded list of
structure types that doesn’t match the manifest’s actual form_factor values,
causing some examples to be skipped. Update the structure selection logic around
the structure_types list and the manifest["sources"] lookup so the displayed
demos are driven from the unique form_factor values present in the manifest (for
example via dynamic extraction/deduplication) instead of fixed labels like Grain
Boundary, Defect, and 3D Crystal.
- Around line 606-607: The final success messages are hardcoded in the notebook
and should be computed from the actual conversion/provenance results instead of
always printing checkmarks. Update the cells around the sampled structure
conversion and provenance sampling logic so the summary reflects whether every
sampled structure converted successfully and whether provenance metadata was
actually preserved for the processed entries, using the existing notebook flow
rather than static prints.
- Around line 67-84: The Figshare download flow in figshare_article_id and
download_figshare_article has no explicit timeout handling and writes directly
to the final target, which can leave bad cached files after interruptions.
Update both urllib.request.urlopen calls to use a timeout, and change the file
download path in download_figshare_article so it downloads to a
temporary/incomplete file first and only moves it into place once the transfer
succeeds. Keep the existing article_id and article/files flow intact while
making the caching behavior atomic.
- Around line 88-107: Validate the loaded manifest before using it in
dataset_is_cached and after reading MANIFEST_NAME in the notebook cell, since
yaml.safe_load() may return None or a non-dict. Update the manifest handling
around dataset_is_cached and the later manifest access so it checks for a
mapping and a valid sources field before calling .get() or indexing
manifest["sources"], and fail early with a clear error if the manifest is
invalid.
- Around line 81-84: The file download loop in the notebook uses
file_info["name"] directly to build the local target path, which allows path
traversal or absolute-path escapes. Update the file handling in the
article["files"] iteration to sanitize and validate each filename before joining
it to data_dir, ensuring the resolved destination stays under the cache
directory. Keep the logic anchored around the existing file_info["name"],
target, and urllib.request.urlretrieve flow.

---

Nitpick comments:
In `@examples/material/provenance_and_interoperability_demo.ipynb`:
- Line 304: The notebook cell currently prints every entry from
ase_atoms.get_chemical_symbols(), which creates noisy output for large slabs.
Update the nearby print logic in the demo cell to summarize the composition
instead of listing all atoms, using a concise aggregate derived from
ase_atoms.get_chemical_symbols() so the output stays compact and readable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13a9a2f5-ffd8-472e-8bc9-c0c5357d763e

📥 Commits

Reviewing files that changed from the base of the PR and between 8e1a82d and 52199f4.

📒 Files selected for processing (1)

examples/material/provenance_and_interoperability_demo.ipynb

coderabbitai · 2026-06-26T16:12:29Z

+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Downloading Mat3rials Explorer dataset from Figshare...\n",
+            "Downloaded 77 structures to /var/folders/wq/kjb0_d9126xd_3j3c13f7n9w0000gn/T/mat3rials_figshare_j72yv3nv\n"
+          ]


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Clear or rerun stale notebook outputs.

This output still shows a local temp path and messages that no longer match the current source cell. Clear outputs before commit or rerun the notebook end-to-end after the final code changes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 21 - 28, The notebook cell contains stale execution output that no longer matches the current source and includes a machine-specific temp path. Clear the saved outputs or rerun the notebook end-to-end so the outputs in provenance_and_interoperability_demo.ipynb reflect the current behavior, especially the cell that prints the Mat3rials Explorer download messages.

coderabbitai · 2026-06-26T16:12:29Z

+        "    with urllib.request.urlopen(url) as response:\n",
+        "        final_url = response.geturl()\n",
+        "    match = re.search(r\"/(\\d+)(?:/|$)\", urllib.parse.urlparse(final_url).path)\n",
+        "    if not match:\n",
+        "        raise ValueError(f\"Could not find Figshare article id in {url}\")\n",
+        "    return match.group(1)\n",
+        "\n",
+        "\n",
+        "def download_figshare_article(url, data_dir):\n",
+        "    article_id = figshare_article_id(url)\n",
+        "    api_url = f\"https://api.figshare.com/v2/articles/{article_id}\"\n",
+        "    with urllib.request.urlopen(api_url) as response:\n",
+        "        article = json.load(response)\n",
+        "    data_dir.mkdir(parents=True, exist_ok=True)\n",
+        "    for file_info in article[\"files\"]:\n",
+        "        target = data_dir / file_info[\"name\"]\n",
+        "        target.parent.mkdir(parents=True, exist_ok=True)\n",
+        "        urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add explicit network timeouts and avoid partial cache files.

The Figshare requests can hang indefinitely, and urlretrieve() writes directly to the final path, so an interrupted download can later be treated as cached.

Suggested download helper

+import shutil + +FIGSHARE_TIMEOUT_SECONDS = 60 + ... - with urllib.request.urlopen(url) as response: + with urllib.request.urlopen(url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response: final_url = response.geturl() ... - with urllib.request.urlopen(api_url) as response: + with urllib.request.urlopen(api_url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response: article = json.load(response) ... - urllib.request.urlretrieve(file_info["download_url"], target) + tmp_target = target.with_suffix(target.suffix + ".part") + with urllib.request.urlopen(file_info["download_url"], timeout=FIGSHARE_TIMEOUT_SECONDS) as response: + with tmp_target.open("wb") as output: + shutil.copyfileobj(response, output) + tmp_target.replace(target)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

" with urllib.request.urlopen(url) as response:\n",

" final_url = response.geturl()\n",

" match = re.search(r\"/(\\d+)(?:/|$)\", urllib.parse.urlparse(final_url).path)\n",

" if not match:\n",

" raise ValueError(f\"Could not find Figshare article id in {url}\")\n",

" return match.group(1)\n",

"\n",

"\n",

"def download_figshare_article(url, data_dir):\n",

" article_id = figshare_article_id(url)\n",

" api_url = f\"https://api.figshare.com/v2/articles/{article_id}\"\n",

" with urllib.request.urlopen(api_url) as response:\n",

" article = json.load(response)\n",

" data_dir.mkdir(parents=True, exist_ok=True)\n",

" for file_info in article[\"files\"]:\n",

" target = data_dir / file_info[\"name\"]\n",

" target.parent.mkdir(parents=True, exist_ok=True)\n",

" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",

import shutil

FIGSHARE_TIMEOUT_SECONDS = 60

def figshare_article_id(url):

with urllib.request.urlopen(url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response:

final_url = response.geturl()

match = re.search(r"/(\d+)(?:/|$)", urllib.parse.urlparse(final_url).path)

if not match:

raise ValueError(f"Could not find Figshare article id in {url}")

return match.group(1)

def download_figshare_article(url, data_dir):

article_id = figshare_article_id(url)

api_url = f"https://api.figshare.com/v2/articles/{article_id}"

with urllib.request.urlopen(api_url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response:

article = json.load(response)

data_dir.mkdir(parents=True, exist_ok=True)

for file_info in article["files"]:

target = data_dir / file_info["name"]

target.parent.mkdir(parents=True, exist_ok=True)

tmp_target = target.with_suffix(target.suffix + ".part")

with urllib.request.urlopen(file_info["download_url"], timeout=FIGSHARE_TIMEOUT_SECONDS) as response:

with tmp_target.open("wb") as output:

shutil.copyfileobj(response, output)

tmp_target.replace(target)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 67 - 84, The Figshare download flow in figshare_article_id and download_figshare_article has no explicit timeout handling and writes directly to the final target, which can leave bad cached files after interruptions. Update both urllib.request.urlopen calls to use a timeout, and change the file download path in download_figshare_article so it downloads to a temporary/incomplete file first and only moves it into place once the transfer succeeds. Keep the existing article_id and article/files flow intact while making the caching behavior atomic.

coderabbitai · 2026-06-26T16:12:29Z

+        "    for file_info in article[\"files\"]:\n",
+        "        target = data_dir / file_info[\"name\"]\n",
+        "        target.parent.mkdir(parents=True, exist_ok=True)\n",
+        "        urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Constrain Figshare filenames before writing them locally.

file_info["name"] comes from an external API; an absolute path or ../ segment would escape data_dir and overwrite files outside the cache.

Suggested hardening

data_dir.mkdir(parents=True, exist_ok=True) + base_dir = data_dir.resolve() for file_info in article["files"]: - target = data_dir / file_info["name"] + target = (base_dir / file_info["name"]).resolve() + if not target.is_relative_to(base_dir): + raise ValueError(f"Unsafe Figshare filename: {file_info['name']}") target.parent.mkdir(parents=True, exist_ok=True) urllib.request.urlretrieve(file_info["download_url"], target)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

" for file_info in article[\"files\"]:\n",

" target = data_dir / file_info[\"name\"]\n",

" target.parent.mkdir(parents=True, exist_ok=True)\n",

" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",

" data_dir.mkdir(parents=True, exist_ok=True)\n",

" base_dir = data_dir.resolve()\n",

" for file_info in article[\"files\"]:\n",

" target = (base_dir / file_info[\"name\"]).resolve()\n",

" if not target.is_relative_to(base_dir):\n",

" raise ValueError(f\"Unsafe Figshare filename: {file_info['name']}\")\n",

" target.parent.mkdir(parents=True, exist_ok=True)\n",

" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 81 - 84, The file download loop in the notebook uses file_info["name"] directly to build the local target path, which allows path traversal or absolute-path escapes. Update the file handling in the article["files"] iteration to sanitize and validate each filename before joining it to data_dir, ensuring the resolved destination stays under the cache directory. Keep the logic anchored around the existing file_info["name"], target, and urllib.request.urlretrieve flow.

coderabbitai · 2026-06-26T16:12:29Z

+        "def dataset_is_cached(data_dir):\n",
+        "    manifest_path = data_dir / MANIFEST_NAME\n",
+        "    if not manifest_path.exists():\n",
+        "        return False\n",
+        "    manifest = yaml.safe_load(manifest_path.read_text())\n",
+        "    sources = manifest.get(\"sources\") or []\n",
+        "    if not sources:\n",
+        "        return False\n",
+        "    return all(structure_path(entry[\"filename\"]).exists() for entry in sources)\n",
+        "\n",
+        "\n",
+        "DATA_DIR = default_data_dir()\n",
+        "if dataset_is_cached(DATA_DIR):\n",
+        "    print(f\"Using cached dataset in {DATA_DIR.resolve()}\")\n",
+        "else:\n",
+        "    print(f\"Downloading Mat3rials Explorer dataset from Figshare to {DATA_DIR.resolve()}...\")\n",
+        "    download_figshare_article(FIGSHARE_URL, DATA_DIR)\n",
+        "\n",
+        "manifest = yaml.safe_load((DATA_DIR / MANIFEST_NAME).read_text())\n",
+        "print(f\"Loaded {len(manifest['sources'])} structures from {DATA_DIR.resolve()}\")"


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Validate manifest.yaml before cache checks and indexing.

yaml.safe_load() can return None or a non-dict, but the code immediately calls .get() and later indexes manifest["sources"]. Validate once and fail with a clear message.

Suggested manifest loader

+def load_manifest(data_dir): + manifest_path = data_dir / MANIFEST_NAME + manifest = yaml.safe_load(manifest_path.read_text()) + if not isinstance(manifest, dict) or not isinstance(manifest.get("sources"), list): + raise ValueError(f"{manifest_path} must contain a 'sources' list") + return manifest + + def dataset_is_cached(data_dir): - manifest_path = data_dir / MANIFEST_NAME - if not manifest_path.exists(): - return False - manifest = yaml.safe_load(manifest_path.read_text()) + try: + manifest = load_manifest(data_dir) + except (FileNotFoundError, ValueError): + return False sources = manifest.get("sources") or [] ... -manifest = yaml.safe_load((DATA_DIR / MANIFEST_NAME).read_text()) +manifest = load_manifest(DATA_DIR)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"def dataset_is_cached(data_dir):\n",

" manifest_path = data_dir / MANIFEST_NAME\n",

" if not manifest_path.exists():\n",

" return False\n",

" manifest = yaml.safe_load(manifest_path.read_text())\n",

" sources = manifest.get(\"sources\") or []\n",

" if not sources:\n",

" return False\n",

" return all(structure_path(entry[\"filename\"]).exists() for entry in sources)\n",

"\n",

"\n",

"DATA_DIR = default_data_dir()\n",

"if dataset_is_cached(DATA_DIR):\n",

" print(f\"Using cached dataset in {DATA_DIR.resolve()}\")\n",

"else:\n",

" print(f\"Downloading Mat3rials Explorer dataset from Figshare to {DATA_DIR.resolve()}...\")\n",

" download_figshare_article(FIGSHARE_URL, DATA_DIR)\n",

"\n",

"manifest = yaml.safe_load((DATA_DIR / MANIFEST_NAME).read_text())\n",

"print(f\"Loaded {len(manifest['sources'])} structures from {DATA_DIR.resolve()}\")"

"def load_manifest(data_dir):\n",

" manifest_path = data_dir / MANIFEST_NAME\n",

" manifest = yaml.safe_load(manifest_path.read_text())\n",

" if not isinstance(manifest, dict) or not isinstance(manifest.get(\"sources\"), list):\n",

" raise ValueError(f\"{manifest_path} must contain a 'sources' list\")\n",

" return manifest\n",

"\n",

"\n",

"def dataset_is_cached(data_dir):\n",

" try:\n",

" manifest = load_manifest(data_dir)\n",

" except (FileNotFoundError, ValueError):\n",

" return False\n",

" sources = manifest.get(\"sources\") or []\n",

" if not sources:\n",

" return False\n",

" return all(structure_path(entry[\"filename\"]).exists() for entry in sources)\n",

"\n",

"\n",

"DATA_DIR = default_data_dir()\n",

"if dataset_is_cached(DATA_DIR):\n",

" print(f\"Using cached dataset in {DATA_DIR.resolve()}\")\n",

"else:\n",

" print(f\"Downloading Mat3rials Explorer dataset from Figshare to {DATA_DIR.resolve()}...\")\n",

" download_figshare_article(FIGSHARE_URL, DATA_DIR)\n",

"\n",

"manifest = load_manifest(DATA_DIR)\n",

"print(f\"Loaded {len(manifest['sources'])} structures from {DATA_DIR.resolve()}\")"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 88 - 107, Validate the loaded manifest before using it in dataset_is_cached and after reading MANIFEST_NAME in the notebook cell, since yaml.safe_load() may return None or a non-dict. Update the manifest handling around dataset_is_cached and the later manifest access so it checks for a mapping and a valid sources field before calling .get() or indexing manifest["sources"], and fail early with a clear error if the manifest is invalid.

coderabbitai · 2026-06-26T16:12:29Z

+        "# Install required packages if needed\n",
+        "try:\n",
+        "    import ase\n",
+        "    from ase.visualize import view\n",
+        "except ImportError:\n",
+        "    print(\"Installing ASE...\")\n",
+        "    !pip install ase\n",
+        "    import ase\n",
+        "    from ase.visualize import view\n",
+        "\n",
+        "try:\n",
+        "    import pymatgen\n",
+        "    from pymatgen.core import Structure as PymatgenStructure\n",
+        "except ImportError:\n",
+        "    print(\"Installing pymatgen...\")\n",
+        "    !pip install pymatgen\n",
+        "    import pymatgen\n",
+        "    from pymatgen.core import Structure as PymatgenStructure"


🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Find project dependency declarations for ASE/pymatgen. fd -i 'pyproject\.toml|requirements.*\.txt|environment.*\.ya?ml|setup\.py|setup\.cfg|Pipfile|poetry\.lock' . \ -x sh -c 'printf "\n### %s\n" "$1"; rg -n "(^|[\"'"'"'[:space:]])(ase|pymatgen)([<=>[:space:]\"'"'"']|$)" "$1" || true' sh {}

Repository: mat3ra/api-examples

Length of output: 418

Pin ase and pymatgen versions in notebook installs to match pyproject.toml.

The notebook uses unpinned !pip install which may target the wrong interpreter and cause version drift. The repository already declares compatible versions in pyproject.toml:

ase>=3.25.0

pymatgen==2024.4.13

Update the install logic to use these exact pins and prefer %pip for kernel targeting:

Current problematic pattern

!pip install ase !pip install pymatgen

Replace with:

!pip install "ase>=3.25.0" "pymatgen==2024.4.13"

Or install the project dependencies directly if available:

!pip install -e .

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 179 - 196, The notebook install cells for ASE and pymatgen use unpinned !pip installs that can drift from the versions declared in pyproject.toml and may target the wrong interpreter. Update the import fallback logic in the notebook cell that handles ase and pymatgen installation to use %pip (or install the project dependencies directly) and pin the versions to match the repository’s declared compatibility, keeping the existing import flow in the same cell.

coderabbitai · 2026-06-26T16:12:29Z

+        "# Sample different structure types\n",
+        "structure_types = [\"Interface\", \"Slab\", \"Grain Boundary\", \"Defect\", \"3D Crystal\"]\n",
+        "\n",
+        "print(\"Structure Type Demonstrations:\\n\")\n",
+        "for struct_type in structure_types:\n",
+        "    entry = next((e for e in manifest[\"sources\"] if e.get(\"form_factor\") == struct_type), None)\n",


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Drive structure examples from the manifest values.

The printed form factors are Interface, Bulk, Slab, Surface, and Monolayer, but this hardcoded list asks for Grain Boundary, Defect, and 3D Crystal, so those demos silently skip.

Suggested dynamic selection

-structure_types = ["Interface", "Slab", "Grain Boundary", "Defect", "3D Crystal"] +structure_types = [form_factor for form_factor, _ in form_factors.most_common() if form_factor]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"# Sample different structure types\n",

"structure_types = [\"Interface\", \"Slab\", \"Grain Boundary\", \"Defect\", \"3D Crystal\"]\n",

"\n",

"print(\"Structure Type Demonstrations:\\n\")\n",

"for struct_type in structure_types:\n",

" entry = next((e for e in manifest[\"sources\"] if e.get(\"form_factor\") == struct_type), None)\n",

"# Sample different structure types\n",

"structure_types = [form_factor for form_factor, _ in form_factors.most_common() if form_factor]\n",

"\n",

"print(\"Structure Type Demonstrations:\\n\")\n",

"for struct_type in structure_types:\n",

" entry = next((e for e in manifest[\"sources\"] if e.get(\"form_factor\") == struct_type), None)\n",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 537 - 542, The demo in the notebook is using a hardcoded list of structure types that doesn’t match the manifest’s actual form_factor values, causing some examples to be skipped. Update the structure selection logic around the structure_types list and the manifest["sources"] lookup so the displayed demos are driven from the unique form_factor values present in the manifest (for example via dynamic extraction/deduplication) instead of fixed labels like Grain Boundary, Defect, and 3D Crystal.

coderabbitai · 2026-06-26T16:12:29Z

+        "print(f\"  All structures successfully converted to ASE/pymatgen: \u2713\")\n",
+        "print(f\"  Provenance metadata preserved: \u2713\")\n",


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Compute these checkmarks instead of asserting them.

The notebook only converts sampled structures and samples provenance metadata, so these final lines can report success even when some entries fail conversion or lack provenance.

Suggested verification sweep

-print(f" All structures successfully converted to ASE/pymatgen: ✓") -print(f" Provenance metadata preserved: ✓") +conversion_failures = [] +provenance_count = 0 +for entry in manifest["sources"]: + data = load_structure(entry) + try: + mat3ra_to_pymatgen(data) + mat3ra_to_ase(data) + except Exception as exc: + conversion_failures.append((entry.get("name"), type(exc).__name__)) + if (data.get("metadata") or {}).get("build"): + provenance_count += 1 + +print(f" Structures converted successfully: {len(manifest['sources']) - len(conversion_failures)}/{len(manifest['sources'])}") +print(f" Structures with provenance metadata: {provenance_count}/{len(manifest['sources'])}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"print(f\" All structures successfully converted to ASE/pymatgen: \u2713\")\n",

"print(f\" Provenance metadata preserved: \u2713\")\n",

conversion_failures = []

provenance_count = 0

for entry in manifest["sources"]:

data = load_structure(entry)

try:

mat3ra_to_pymatgen(data)

mat3ra_to_ase(data)

except Exception as exc:

conversion_failures.append((entry.get("name"), type(exc).__name__))

if (data.get("metadata") or {}).get("build"):

provenance_count += 1

print(f" Structures converted successfully: {len(manifest['sources']) - len(conversion_failures)}/{len(manifest['sources'])}")

print(f" Structures with provenance metadata: {provenance_count}/{len(manifest['sources'])}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 606 - 607, The final success messages are hardcoded in the notebook and should be computed from the actual conversion/provenance results instead of always printing checkmarks. Update the cells around the sampled structure conversion and provenance sampling logic so the summary reflects whether every sampled structure converted successfully and whether provenance metadata was actually preserved for the processed entries, using the existing notebook flow rather than static prints.

timurbazhirov changed the title ~~update: nb~~ update: mat3ra-2d 2026-06 review response nb Jun 26, 2026

update: nb interoperability

98999c8

VsevolodX force-pushed the feat/mat3rials-explorer branch from 52199f4 to 98999c8 Compare June 26, 2026 16:12

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

update: nb interoperability

34843d3

timurbazhirov merged commit 2f42f7a into main Jun 26, 2026
6 of 7 checks passed

timurbazhirov deleted the feat/mat3rials-explorer branch June 26, 2026 16:17

		"print(f\" All structures successfully converted to ASE/pymatgen: \u2713\")\n",
		"print(f\" Provenance metadata preserved: \u2713\")\n",

-        "print(f\"  All structures successfully converted to ASE/pymatgen: \u2713\")\n",
-        "print(f\"  Provenance metadata preserved: \u2713\")\n",
+conversion_failures = []
+provenance_count = 0
+for entry in manifest["sources"]:
+    data = load_structure(entry)
+    try:
+        mat3ra_to_pymatgen(data)
+        mat3ra_to_ase(data)
+    except Exception as exc:
+        conversion_failures.append((entry.get("name"), type(exc).__name__))
+    if (data.get("metadata") or {}).get("build"):
+        provenance_count += 1
+print(f"  Structures converted successfully: {len(manifest['sources']) - len(conversion_failures)}/{len(manifest['sources'])}")
+print(f"  Structures with provenance metadata: {provenance_count}/{len(manifest['sources'])}")

Uh oh!

Conversation

VsevolodX commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

review-notebook-app Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VsevolodX commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading