Skip to content

update: mat3ra-2d 2026-06 review response nb#338

Merged
timurbazhirov merged 2 commits into
mainfrom
feat/mat3rials-explorer
Jun 26, 2026
Merged

update: mat3ra-2d 2026-06 review response nb#338
timurbazhirov merged 2 commits into
mainfrom
feat/mat3rials-explorer

Conversation

@VsevolodX

@VsevolodX VsevolodX commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features
    • Added a new interactive notebook demonstrating dataset download and local caching, structure loading, and manifest-based checks.
    • Showcases structural diversity stats, conversions to common structure formats, and key slab properties.
    • Includes provenance examples that display interface, slab, and embedded bulk metadata relationships.
    • Adds optional examples for grain-boundary entries and batch summaries across multiple structure types.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@VsevolodX, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 49 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3e9820c-f4d9-456f-899a-62f6154e73d3

📥 Commits

Reviewing files that changed from the base of the PR and between 52199f4 and 34843d3.

📒 Files selected for processing (1)
  • other/mat3rials_explorer/provenance_and_interoperability_demo.ipynb
📝 Walkthrough

Walkthrough

Adds a notebook that downloads the Mat3rials Explorer dataset with local caching, loads its manifest, computes dataset diversity stats, converts structures to pymatgen and ASE, and prints slab, provenance, grain-boundary, and multi-structure summaries.

Changes

Material provenance and interoperability demo

Layer / File(s) Summary
Dataset fetch and manifest
examples/material/provenance_and_interoperability_demo.ipynb
Resolves the Figshare article id, downloads the dataset into a local cache, validates manifest.yaml, and loads the manifest.
Structural diversity stats
examples/material/provenance_and_interoperability_demo.ipynb
Counts form factors and dimensionalities from manifest["sources"] and prints totals and unique formula counts.
Conversion helpers and slab example
examples/material/provenance_and_interoperability_demo.ipynb
Installs ASE and pymatgen if needed, defines Mat3ra JSON converters, and runs a slab conversion example through both adapters.
Provenance chain and grain boundary
examples/material/provenance_and_interoperability_demo.ipynb
Extracts build configuration data, prints interface and slab provenance fields, and reports a grain-boundary entry's metadata keys.
Structure sweep and wrap-up
examples/material/provenance_and_interoperability_demo.ipynb
Iterates across selected structure types, prints pymatgen summaries, and ends with notebook statistics and metadata.

Sequence Diagram(s)

sequenceDiagram
  participant Notebook as Notebook cell
  participant Cache as dataset_is_cached
  participant Download as download_figshare_article
  participant Pymatgen as mat3ra_to_pymatgen
  participant ASE as mat3ra_to_ase
  participant Provenance as get_interface_configuration

  Notebook->>Cache: check data_dir against manifest.yaml
  Cache-->>Notebook: cache hit or miss
  Notebook->>Download: fetch the Figshare article when needed
  Notebook->>Pymatgen: convert Mat3ra JSON to Structure
  Notebook->>ASE: convert Mat3ra JSON to Atoms
  Notebook->>Provenance: extract InterfaceConfiguration metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through Figshare, swift and bright,
With slab and cache and manifest in sight.
I nibbled provenance, hop by hop,
And watched the structure stories pop.
Pymatgen, ASE, now dance as one—
A bunny’s notebook hop is done.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is only loosely related to the notebook and reads more like a work note than a clear summary of the change. Rename it to describe the new notebook, e.g. "Add provenance and interoperability demo notebook for Mat3rials Explorer".
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/mat3rials-explorer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@timurbazhirov timurbazhirov changed the title update: nb update: mat3ra-2d 2026-06 review response nb Jun 26, 2026
@VsevolodX VsevolodX force-pushed the feat/mat3rials-explorer branch from 52199f4 to 98999c8 Compare June 26, 2026 16:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
examples/material/provenance_and_interoperability_demo.ipynb (1)

304-304: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Summarize symbols instead of printing every atom.

For large slabs this bloats the notebook output and makes reviews/renders noisy.

Suggested output reduction
-    print(f"  Chemical symbols: {ase_atoms.get_chemical_symbols()}")
+    print(f"  Chemical symbols: {dict(Counter(ase_atoms.get_chemical_symbols()))}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` at line 304,
The notebook cell currently prints every entry from
ase_atoms.get_chemical_symbols(), which creates noisy output for large slabs.
Update the nearby print logic in the demo cell to summarize the composition
instead of listing all atoms, using a concise aggregate derived from
ase_atoms.get_chemical_symbols() so the output stays compact and readable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/material/provenance_and_interoperability_demo.ipynb`:
- Around line 21-28: The notebook cell contains stale execution output that no
longer matches the current source and includes a machine-specific temp path.
Clear the saved outputs or rerun the notebook end-to-end so the outputs in
provenance_and_interoperability_demo.ipynb reflect the current behavior,
especially the cell that prints the Mat3rials Explorer download messages.
- Around line 179-196: The notebook install cells for ASE and pymatgen use
unpinned !pip installs that can drift from the versions declared in
pyproject.toml and may target the wrong interpreter. Update the import fallback
logic in the notebook cell that handles ase and pymatgen installation to use
%pip (or install the project dependencies directly) and pin the versions to
match the repository’s declared compatibility, keeping the existing import flow
in the same cell.
- Around line 537-542: The demo in the notebook is using a hardcoded list of
structure types that doesn’t match the manifest’s actual form_factor values,
causing some examples to be skipped. Update the structure selection logic around
the structure_types list and the manifest["sources"] lookup so the displayed
demos are driven from the unique form_factor values present in the manifest (for
example via dynamic extraction/deduplication) instead of fixed labels like Grain
Boundary, Defect, and 3D Crystal.
- Around line 606-607: The final success messages are hardcoded in the notebook
and should be computed from the actual conversion/provenance results instead of
always printing checkmarks. Update the cells around the sampled structure
conversion and provenance sampling logic so the summary reflects whether every
sampled structure converted successfully and whether provenance metadata was
actually preserved for the processed entries, using the existing notebook flow
rather than static prints.
- Around line 67-84: The Figshare download flow in figshare_article_id and
download_figshare_article has no explicit timeout handling and writes directly
to the final target, which can leave bad cached files after interruptions.
Update both urllib.request.urlopen calls to use a timeout, and change the file
download path in download_figshare_article so it downloads to a
temporary/incomplete file first and only moves it into place once the transfer
succeeds. Keep the existing article_id and article/files flow intact while
making the caching behavior atomic.
- Around line 88-107: Validate the loaded manifest before using it in
dataset_is_cached and after reading MANIFEST_NAME in the notebook cell, since
yaml.safe_load() may return None or a non-dict. Update the manifest handling
around dataset_is_cached and the later manifest access so it checks for a
mapping and a valid sources field before calling .get() or indexing
manifest["sources"], and fail early with a clear error if the manifest is
invalid.
- Around line 81-84: The file download loop in the notebook uses
file_info["name"] directly to build the local target path, which allows path
traversal or absolute-path escapes. Update the file handling in the
article["files"] iteration to sanitize and validate each filename before joining
it to data_dir, ensuring the resolved destination stays under the cache
directory. Keep the logic anchored around the existing file_info["name"],
target, and urllib.request.urlretrieve flow.

---

Nitpick comments:
In `@examples/material/provenance_and_interoperability_demo.ipynb`:
- Line 304: The notebook cell currently prints every entry from
ase_atoms.get_chemical_symbols(), which creates noisy output for large slabs.
Update the nearby print logic in the demo cell to summarize the composition
instead of listing all atoms, using a concise aggregate derived from
ase_atoms.get_chemical_symbols() so the output stays compact and readable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13a9a2f5-ffd8-472e-8bc9-c0c5357d763e

📥 Commits

Reviewing files that changed from the base of the PR and between 8e1a82d and 52199f4.

📒 Files selected for processing (1)
  • examples/material/provenance_and_interoperability_demo.ipynb

Comment on lines +21 to +28
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading Mat3rials Explorer dataset from Figshare...\n",
"Downloaded 77 structures to /var/folders/wq/kjb0_d9126xd_3j3c13f7n9w0000gn/T/mat3rials_figshare_j72yv3nv\n"
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Clear or rerun stale notebook outputs.

This output still shows a local temp path and messages that no longer match the current source cell. Clear outputs before commit or rerun the notebook end-to-end after the final code changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 21
- 28, The notebook cell contains stale execution output that no longer matches
the current source and includes a machine-specific temp path. Clear the saved
outputs or rerun the notebook end-to-end so the outputs in
provenance_and_interoperability_demo.ipynb reflect the current behavior,
especially the cell that prints the Mat3rials Explorer download messages.

Comment on lines +67 to +84
" with urllib.request.urlopen(url) as response:\n",
" final_url = response.geturl()\n",
" match = re.search(r\"/(\\d+)(?:/|$)\", urllib.parse.urlparse(final_url).path)\n",
" if not match:\n",
" raise ValueError(f\"Could not find Figshare article id in {url}\")\n",
" return match.group(1)\n",
"\n",
"\n",
"def download_figshare_article(url, data_dir):\n",
" article_id = figshare_article_id(url)\n",
" api_url = f\"https://api.figshare.com/v2/articles/{article_id}\"\n",
" with urllib.request.urlopen(api_url) as response:\n",
" article = json.load(response)\n",
" data_dir.mkdir(parents=True, exist_ok=True)\n",
" for file_info in article[\"files\"]:\n",
" target = data_dir / file_info[\"name\"]\n",
" target.parent.mkdir(parents=True, exist_ok=True)\n",
" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Add explicit network timeouts and avoid partial cache files.

The Figshare requests can hang indefinitely, and urlretrieve() writes directly to the final path, so an interrupted download can later be treated as cached.

Suggested download helper
+import shutil
+
+FIGSHARE_TIMEOUT_SECONDS = 60
+
 ...
-    with urllib.request.urlopen(url) as response:
+    with urllib.request.urlopen(url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response:
         final_url = response.geturl()
 ...
-    with urllib.request.urlopen(api_url) as response:
+    with urllib.request.urlopen(api_url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response:
         article = json.load(response)
 ...
-        urllib.request.urlretrieve(file_info["download_url"], target)
+        tmp_target = target.with_suffix(target.suffix + ".part")
+        with urllib.request.urlopen(file_info["download_url"], timeout=FIGSHARE_TIMEOUT_SECONDS) as response:
+            with tmp_target.open("wb") as output:
+                shutil.copyfileobj(response, output)
+        tmp_target.replace(target)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
" with urllib.request.urlopen(url) as response:\n",
" final_url = response.geturl()\n",
" match = re.search(r\"/(\\d+)(?:/|$)\", urllib.parse.urlparse(final_url).path)\n",
" if not match:\n",
" raise ValueError(f\"Could not find Figshare article id in {url}\")\n",
" return match.group(1)\n",
"\n",
"\n",
"def download_figshare_article(url, data_dir):\n",
" article_id = figshare_article_id(url)\n",
" api_url = f\"https://api.figshare.com/v2/articles/{article_id}\"\n",
" with urllib.request.urlopen(api_url) as response:\n",
" article = json.load(response)\n",
" data_dir.mkdir(parents=True, exist_ok=True)\n",
" for file_info in article[\"files\"]:\n",
" target = data_dir / file_info[\"name\"]\n",
" target.parent.mkdir(parents=True, exist_ok=True)\n",
" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",
import shutil
FIGSHARE_TIMEOUT_SECONDS = 60
def figshare_article_id(url):
with urllib.request.urlopen(url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response:
final_url = response.geturl()
match = re.search(r"/(\d+)(?:/|$)", urllib.parse.urlparse(final_url).path)
if not match:
raise ValueError(f"Could not find Figshare article id in {url}")
return match.group(1)
def download_figshare_article(url, data_dir):
article_id = figshare_article_id(url)
api_url = f"https://api.figshare.com/v2/articles/{article_id}"
with urllib.request.urlopen(api_url, timeout=FIGSHARE_TIMEOUT_SECONDS) as response:
article = json.load(response)
data_dir.mkdir(parents=True, exist_ok=True)
for file_info in article["files"]:
target = data_dir / file_info["name"]
target.parent.mkdir(parents=True, exist_ok=True)
tmp_target = target.with_suffix(target.suffix + ".part")
with urllib.request.urlopen(file_info["download_url"], timeout=FIGSHARE_TIMEOUT_SECONDS) as response:
with tmp_target.open("wb") as output:
shutil.copyfileobj(response, output)
tmp_target.replace(target)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 67
- 84, The Figshare download flow in figshare_article_id and
download_figshare_article has no explicit timeout handling and writes directly
to the final target, which can leave bad cached files after interruptions.
Update both urllib.request.urlopen calls to use a timeout, and change the file
download path in download_figshare_article so it downloads to a
temporary/incomplete file first and only moves it into place once the transfer
succeeds. Keep the existing article_id and article/files flow intact while
making the caching behavior atomic.

Comment on lines +81 to +84
" for file_info in article[\"files\"]:\n",
" target = data_dir / file_info[\"name\"]\n",
" target.parent.mkdir(parents=True, exist_ok=True)\n",
" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Constrain Figshare filenames before writing them locally.

file_info["name"] comes from an external API; an absolute path or ../ segment would escape data_dir and overwrite files outside the cache.

Suggested hardening
     data_dir.mkdir(parents=True, exist_ok=True)
+    base_dir = data_dir.resolve()
     for file_info in article["files"]:
-        target = data_dir / file_info["name"]
+        target = (base_dir / file_info["name"]).resolve()
+        if not target.is_relative_to(base_dir):
+            raise ValueError(f"Unsafe Figshare filename: {file_info['name']}")
         target.parent.mkdir(parents=True, exist_ok=True)
         urllib.request.urlretrieve(file_info["download_url"], target)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
" for file_info in article[\"files\"]:\n",
" target = data_dir / file_info[\"name\"]\n",
" target.parent.mkdir(parents=True, exist_ok=True)\n",
" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",
" data_dir.mkdir(parents=True, exist_ok=True)\n",
" base_dir = data_dir.resolve()\n",
" for file_info in article[\"files\"]:\n",
" target = (base_dir / file_info[\"name\"]).resolve()\n",
" if not target.is_relative_to(base_dir):\n",
" raise ValueError(f\"Unsafe Figshare filename: {file_info['name']}\")\n",
" target.parent.mkdir(parents=True, exist_ok=True)\n",
" urllib.request.urlretrieve(file_info[\"download_url\"], target)\n",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 81
- 84, The file download loop in the notebook uses file_info["name"] directly to
build the local target path, which allows path traversal or absolute-path
escapes. Update the file handling in the article["files"] iteration to sanitize
and validate each filename before joining it to data_dir, ensuring the resolved
destination stays under the cache directory. Keep the logic anchored around the
existing file_info["name"], target, and urllib.request.urlretrieve flow.

Comment on lines +88 to +107
"def dataset_is_cached(data_dir):\n",
" manifest_path = data_dir / MANIFEST_NAME\n",
" if not manifest_path.exists():\n",
" return False\n",
" manifest = yaml.safe_load(manifest_path.read_text())\n",
" sources = manifest.get(\"sources\") or []\n",
" if not sources:\n",
" return False\n",
" return all(structure_path(entry[\"filename\"]).exists() for entry in sources)\n",
"\n",
"\n",
"DATA_DIR = default_data_dir()\n",
"if dataset_is_cached(DATA_DIR):\n",
" print(f\"Using cached dataset in {DATA_DIR.resolve()}\")\n",
"else:\n",
" print(f\"Downloading Mat3rials Explorer dataset from Figshare to {DATA_DIR.resolve()}...\")\n",
" download_figshare_article(FIGSHARE_URL, DATA_DIR)\n",
"\n",
"manifest = yaml.safe_load((DATA_DIR / MANIFEST_NAME).read_text())\n",
"print(f\"Loaded {len(manifest['sources'])} structures from {DATA_DIR.resolve()}\")"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Validate manifest.yaml before cache checks and indexing.

yaml.safe_load() can return None or a non-dict, but the code immediately calls .get() and later indexes manifest["sources"]. Validate once and fail with a clear message.

Suggested manifest loader
+def load_manifest(data_dir):
+    manifest_path = data_dir / MANIFEST_NAME
+    manifest = yaml.safe_load(manifest_path.read_text())
+    if not isinstance(manifest, dict) or not isinstance(manifest.get("sources"), list):
+        raise ValueError(f"{manifest_path} must contain a 'sources' list")
+    return manifest
+
+
 def dataset_is_cached(data_dir):
-    manifest_path = data_dir / MANIFEST_NAME
-    if not manifest_path.exists():
-        return False
-    manifest = yaml.safe_load(manifest_path.read_text())
+    try:
+        manifest = load_manifest(data_dir)
+    except (FileNotFoundError, ValueError):
+        return False
     sources = manifest.get("sources") or []
 ...
-manifest = yaml.safe_load((DATA_DIR / MANIFEST_NAME).read_text())
+manifest = load_manifest(DATA_DIR)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"def dataset_is_cached(data_dir):\n",
" manifest_path = data_dir / MANIFEST_NAME\n",
" if not manifest_path.exists():\n",
" return False\n",
" manifest = yaml.safe_load(manifest_path.read_text())\n",
" sources = manifest.get(\"sources\") or []\n",
" if not sources:\n",
" return False\n",
" return all(structure_path(entry[\"filename\"]).exists() for entry in sources)\n",
"\n",
"\n",
"DATA_DIR = default_data_dir()\n",
"if dataset_is_cached(DATA_DIR):\n",
" print(f\"Using cached dataset in {DATA_DIR.resolve()}\")\n",
"else:\n",
" print(f\"Downloading Mat3rials Explorer dataset from Figshare to {DATA_DIR.resolve()}...\")\n",
" download_figshare_article(FIGSHARE_URL, DATA_DIR)\n",
"\n",
"manifest = yaml.safe_load((DATA_DIR / MANIFEST_NAME).read_text())\n",
"print(f\"Loaded {len(manifest['sources'])} structures from {DATA_DIR.resolve()}\")"
"def load_manifest(data_dir):\n",
" manifest_path = data_dir / MANIFEST_NAME\n",
" manifest = yaml.safe_load(manifest_path.read_text())\n",
" if not isinstance(manifest, dict) or not isinstance(manifest.get(\"sources\"), list):\n",
" raise ValueError(f\"{manifest_path} must contain a 'sources' list\")\n",
" return manifest\n",
"\n",
"\n",
"def dataset_is_cached(data_dir):\n",
" try:\n",
" manifest = load_manifest(data_dir)\n",
" except (FileNotFoundError, ValueError):\n",
" return False\n",
" sources = manifest.get(\"sources\") or []\n",
" if not sources:\n",
" return False\n",
" return all(structure_path(entry[\"filename\"]).exists() for entry in sources)\n",
"\n",
"\n",
"DATA_DIR = default_data_dir()\n",
"if dataset_is_cached(DATA_DIR):\n",
" print(f\"Using cached dataset in {DATA_DIR.resolve()}\")\n",
"else:\n",
" print(f\"Downloading Mat3rials Explorer dataset from Figshare to {DATA_DIR.resolve()}...\")\n",
" download_figshare_article(FIGSHARE_URL, DATA_DIR)\n",
"\n",
"manifest = load_manifest(DATA_DIR)\n",
"print(f\"Loaded {len(manifest['sources'])} structures from {DATA_DIR.resolve()}\")"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines 88
- 107, Validate the loaded manifest before using it in dataset_is_cached and
after reading MANIFEST_NAME in the notebook cell, since yaml.safe_load() may
return None or a non-dict. Update the manifest handling around dataset_is_cached
and the later manifest access so it checks for a mapping and a valid sources
field before calling .get() or indexing manifest["sources"], and fail early with
a clear error if the manifest is invalid.

Comment on lines +179 to +196
"# Install required packages if needed\n",
"try:\n",
" import ase\n",
" from ase.visualize import view\n",
"except ImportError:\n",
" print(\"Installing ASE...\")\n",
" !pip install ase\n",
" import ase\n",
" from ase.visualize import view\n",
"\n",
"try:\n",
" import pymatgen\n",
" from pymatgen.core import Structure as PymatgenStructure\n",
"except ImportError:\n",
" print(\"Installing pymatgen...\")\n",
" !pip install pymatgen\n",
" import pymatgen\n",
" from pymatgen.core import Structure as PymatgenStructure"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find project dependency declarations for ASE/pymatgen.
fd -i 'pyproject\.toml|requirements.*\.txt|environment.*\.ya?ml|setup\.py|setup\.cfg|Pipfile|poetry\.lock' . \
  -x sh -c 'printf "\n### %s\n" "$1"; rg -n "(^|[\"'"'"'[:space:]])(ase|pymatgen)([<=>[:space:]\"'"'"']|$)" "$1" || true' sh {}

Repository: mat3ra/api-examples

Length of output: 418


Pin ase and pymatgen versions in notebook installs to match pyproject.toml.

The notebook uses unpinned !pip install which may target the wrong interpreter and cause version drift. The repository already declares compatible versions in pyproject.toml:

  • ase>=3.25.0
  • pymatgen==2024.4.13

Update the install logic to use these exact pins and prefer %pip for kernel targeting:

Current problematic pattern
!pip install ase
!pip install pymatgen

Replace with:

!pip install "ase>=3.25.0" "pymatgen==2024.4.13"

Or install the project dependencies directly if available:

!pip install -e .
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines
179 - 196, The notebook install cells for ASE and pymatgen use unpinned !pip
installs that can drift from the versions declared in pyproject.toml and may
target the wrong interpreter. Update the import fallback logic in the notebook
cell that handles ase and pymatgen installation to use %pip (or install the
project dependencies directly) and pin the versions to match the repository’s
declared compatibility, keeping the existing import flow in the same cell.

Comment on lines +537 to +542
"# Sample different structure types\n",
"structure_types = [\"Interface\", \"Slab\", \"Grain Boundary\", \"Defect\", \"3D Crystal\"]\n",
"\n",
"print(\"Structure Type Demonstrations:\\n\")\n",
"for struct_type in structure_types:\n",
" entry = next((e for e in manifest[\"sources\"] if e.get(\"form_factor\") == struct_type), None)\n",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Drive structure examples from the manifest values.

The printed form factors are Interface, Bulk, Slab, Surface, and Monolayer, but this hardcoded list asks for Grain Boundary, Defect, and 3D Crystal, so those demos silently skip.

Suggested dynamic selection
-structure_types = ["Interface", "Slab", "Grain Boundary", "Defect", "3D Crystal"]
+structure_types = [form_factor for form_factor, _ in form_factors.most_common() if form_factor]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"# Sample different structure types\n",
"structure_types = [\"Interface\", \"Slab\", \"Grain Boundary\", \"Defect\", \"3D Crystal\"]\n",
"\n",
"print(\"Structure Type Demonstrations:\\n\")\n",
"for struct_type in structure_types:\n",
" entry = next((e for e in manifest[\"sources\"] if e.get(\"form_factor\") == struct_type), None)\n",
"# Sample different structure types\n",
"structure_types = [form_factor for form_factor, _ in form_factors.most_common() if form_factor]\n",
"\n",
"print(\"Structure Type Demonstrations:\\n\")\n",
"for struct_type in structure_types:\n",
" entry = next((e for e in manifest[\"sources\"] if e.get(\"form_factor\") == struct_type), None)\n",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines
537 - 542, The demo in the notebook is using a hardcoded list of structure types
that doesn’t match the manifest’s actual form_factor values, causing some
examples to be skipped. Update the structure selection logic around the
structure_types list and the manifest["sources"] lookup so the displayed demos
are driven from the unique form_factor values present in the manifest (for
example via dynamic extraction/deduplication) instead of fixed labels like Grain
Boundary, Defect, and 3D Crystal.

Comment on lines +606 to +607
"print(f\" All structures successfully converted to ASE/pymatgen: \u2713\")\n",
"print(f\" Provenance metadata preserved: \u2713\")\n",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Compute these checkmarks instead of asserting them.

The notebook only converts sampled structures and samples provenance metadata, so these final lines can report success even when some entries fail conversion or lack provenance.

Suggested verification sweep
-print(f"  All structures successfully converted to ASE/pymatgen: ✓")
-print(f"  Provenance metadata preserved: ✓")
+conversion_failures = []
+provenance_count = 0
+for entry in manifest["sources"]:
+    data = load_structure(entry)
+    try:
+        mat3ra_to_pymatgen(data)
+        mat3ra_to_ase(data)
+    except Exception as exc:
+        conversion_failures.append((entry.get("name"), type(exc).__name__))
+    if (data.get("metadata") or {}).get("build"):
+        provenance_count += 1
+
+print(f"  Structures converted successfully: {len(manifest['sources']) - len(conversion_failures)}/{len(manifest['sources'])}")
+print(f"  Structures with provenance metadata: {provenance_count}/{len(manifest['sources'])}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"print(f\" All structures successfully converted to ASE/pymatgen: \u2713\")\n",
"print(f\" Provenance metadata preserved: \u2713\")\n",
conversion_failures = []
provenance_count = 0
for entry in manifest["sources"]:
data = load_structure(entry)
try:
mat3ra_to_pymatgen(data)
mat3ra_to_ase(data)
except Exception as exc:
conversion_failures.append((entry.get("name"), type(exc).__name__))
if (data.get("metadata") or {}).get("build"):
provenance_count += 1
print(f" Structures converted successfully: {len(manifest['sources']) - len(conversion_failures)}/{len(manifest['sources'])}")
print(f" Structures with provenance metadata: {provenance_count}/{len(manifest['sources'])}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/material/provenance_and_interoperability_demo.ipynb` around lines
606 - 607, The final success messages are hardcoded in the notebook and should
be computed from the actual conversion/provenance results instead of always
printing checkmarks. Update the cells around the sampled structure conversion
and provenance sampling logic so the summary reflects whether every sampled
structure converted successfully and whether provenance metadata was actually
preserved for the processed entries, using the existing notebook flow rather
than static prints.

@timurbazhirov timurbazhirov merged commit 2f42f7a into main Jun 26, 2026
6 of 7 checks passed
@timurbazhirov timurbazhirov deleted the feat/mat3rials-explorer branch June 26, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants