diff --git a/README.md b/README.md index 3709e10..a35c38f 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,13 @@ +

+ LeMaterial +

+ + # LeMaterial-Fetcher -`lematerial-fetcher` is designed to fetch data from a specified OPTIMADE's compatible JSON-API, process it, and store it in a PostgreSQL database. It is highly concurrent, to handle data fetching and processing efficiently. +LeMaterial-Fetcher is designed to fetch data any external source, process it, and store it in a PostgreSQL database in a pre-defined format with structure-level validation and database-level validators. It is highly concurrent, to handle data fetching and processing efficiently. -The objective is to retrieve information from various OPTIMADE sources and establish a local database. This database will enable us to process and utilize the data according to our specific requirements, which can then be uploaded to an online and easily accessible place like Hugging Face. +The objective is to retrieve information from various sources and establish a local database that can be unified. This database will enable us to process and utilize the data according to our specific requirements, which can then be uploaded to an online and easily accessible place like Hugging Face. **Explore the datasets built with this tool on [Hugging Face](https://huggingface.co/LeMaterial)** 🤗: @@ -29,18 +34,21 @@ We gratefully acknowledge these projects and their dedication to open materials ## Installation 1. Clone the repository: + ```bash git clone git@github.com:LeMaterial/lematerial-fetcher.git cd lematerial-fetcher ``` 2. Set up your environment variables. Copy the provided template and customize it: + ```bash cp .env.example .env vim .env ``` 3. Install the package: + ```bash # Using uv (recommended) uv add git+https://github.com/LeMaterial/lematerial-fetcher.git @@ -91,6 +99,7 @@ lematerial-fetcher [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] ### Available Commands 1. **Materials Project (MP)** + ```bash # Fetch structures lematerial-fetcher mp fetch --table-name mp_structures --num-workers 4 @@ -103,6 +112,7 @@ lematerial-fetcher [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] ``` 2. **Alexandria** + ```bash # Fetch structures lematerial-fetcher alexandria fetch --table-name alex_structures --functional pbe @@ -115,6 +125,7 @@ lematerial-fetcher [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] ``` 3. **OQMD** + ```bash # Fetch data lematerial-fetcher oqmd fetch --table-name oqmd_structures @@ -124,6 +135,7 @@ lematerial-fetcher [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] ``` 4. **Push to Hugging Face** + ```bash lematerial-fetcher push --table-name my_table --hf-repo-id my-repo ``` @@ -133,12 +145,14 @@ lematerial-fetcher [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] These options are available across most commands: #### Database Options + - `--db-conn-str STR`: Complete database connection string - `--db-user USER`: Database username - `--db-host HOST`: Database host (default: localhost) - `--db-name NAME`: Database name (default: lematerial) #### Processing Options + - `--num-workers N`: Number of parallel workers - `--log-dir DIR`: Directory for logs (default: ./logs) - `--max-retries N`: Maximum retry attempts (default: 3) @@ -146,11 +160,13 @@ These options are available across most commands: - `--log-every N`: Log frequency (default: 1000) #### Fetch Options + - `--offset N`: Starting offset (default: 0) - `--table-name NAME`: Target table name - `--limit N`: Items per API request (default: 500) #### Transformer Options + - `--batch-size N`: Batch processing size (default: 500) - `--dest-table-name NAME`: Destination table name - `--traj`: Transform trajectory data @@ -158,6 +174,7 @@ These options are available across most commands: ### Examples 1. **Fetch from Materials Project with custom configuration**: + ```bash lematerial-fetcher mp fetch \ --table-name mp_structures \ @@ -168,6 +185,7 @@ These options are available across most commands: ``` 2. **Transform Alexandria data with source and destination databases**: + ```bash lematerial-fetcher alexandria transform \ --table-name source_table \ @@ -178,6 +196,7 @@ These options are available across most commands: ``` 3. **Push to Hugging Face with custom chunk size**: + ```bash lematerial-fetcher push \ --table-name my_table \ @@ -193,6 +212,7 @@ These options are available across most commands: You can configure the database connection in two ways: 1. **Using individual parameters**: + ```bash # Set password in environment export LEMATERIALFETCHER_DB_PASSWORD=your_password @@ -202,6 +222,7 @@ You can configure the database connection in two ways: ``` 2. **Using a connection string**: + ```bash lematerial-fetcher mp fetch --db-conn-str="host=localhost user=username password=password dbname=database_name sslmode=disable" ``` @@ -209,6 +230,7 @@ You can configure the database connection in two ways: ### MySQL Configuration (for OQMD) MySQL-specific options: + - `--mysql-host HOST`: MySQL host (default: localhost) - `--mysql-user USER`: MySQL username - `--mysql-database NAME`: MySQL database name (default: lematerial) diff --git a/pyproject.toml b/pyproject.toml index b96ad34..0faee0d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -21,6 +21,9 @@ dependencies = [ "beautifulsoup4>=4.13.3", "datasets>=3.4.1", "ijson>=3.3.0", + "moyopy>=0.4.2", + "ase>=3.24.0", + "material-hasher", ] [project.scripts] @@ -51,6 +54,9 @@ dev-dependencies = [ "botocore>=1.36.20", ] +[tool.uv.sources] +material-hasher = { git = "https://github.com/LeMaterial/lematerial-hasher.git" } + [tool.ruff.lint] extend-select = ["I"] diff --git a/src/lematerial_fetcher/database/postgres.py b/src/lematerial_fetcher/database/postgres.py index f77f691..df2416f 100644 --- a/src/lematerial_fetcher/database/postgres.py +++ b/src/lematerial_fetcher/database/postgres.py @@ -483,13 +483,17 @@ def columns(cls) -> dict[str, str]: "last_modified": "TIMESTAMP", "stress_tensor": "FLOAT[][]", "energy": "FLOAT", + "energy_corrected": "FLOAT", "magnetic_moments": "FLOAT[]", "forces": "FLOAT[][]", "total_magnetization": "FLOAT", "dos_ef": "FLOAT", + "charges": "FLOAT[]", + "band_gap_indirect": "FLOAT", "functional": "TEXT", + "space_group_it_number": "INTEGER", "cross_compatibility": "BOOLEAN", - "entalpic_fingerprint": "FLOAT[]", + "bawl_fingerprint": "TEXT", } def _prepare_species_data(self, species: list[dict[str, Any]]) -> list[Json]: @@ -557,13 +561,17 @@ def insert_data(self, structure: OptimadeStructure) -> None: structure.last_modified, structure.stress_tensor, structure.energy, + structure.energy_corrected, structure.magnetic_moments, structure.forces, structure.total_magnetization, structure.dos_ef, + structure.charges, + structure.band_gap_indirect, structure.functional, + structure.space_group_it_number, structure.cross_compatibility, - structure.entalpic_fingerprint, + structure.bawl_fingerprint, ) cur.execute(query, input_data) self.conn.commit() @@ -620,13 +628,17 @@ def batch_insert_data( structure.last_modified, structure.stress_tensor, structure.energy, + structure.energy_corrected, structure.magnetic_moments, structure.forces, structure.total_magnetization, structure.dos_ef, + structure.charges, + structure.band_gap_indirect, structure.functional, + structure.space_group_it_number, structure.cross_compatibility, - structure.entalpic_fingerprint, + structure.bawl_fingerprint, ) ) @@ -717,13 +729,17 @@ def insert_data(self, structure: Trajectory) -> None: structure.last_modified, structure.stress_tensor, structure.energy, + structure.energy_corrected, structure.magnetic_moments, structure.forces, structure.total_magnetization, structure.dos_ef, + structure.charges, + structure.band_gap_indirect, structure.functional, + structure.space_group_it_number, structure.cross_compatibility, - structure.entalpic_fingerprint, + structure.bawl_fingerprint, # trajectory-specific fields structure.relaxation_step, structure.relaxation_number, @@ -783,13 +799,17 @@ def batch_insert_data( structure.last_modified, structure.stress_tensor, structure.energy, + structure.energy_corrected, structure.magnetic_moments, structure.forces, structure.total_magnetization, structure.dos_ef, + structure.charges, + structure.band_gap_indirect, structure.functional, + structure.space_group_it_number, structure.cross_compatibility, - structure.entalpic_fingerprint, + structure.bawl_fingerprint, # trajectory-specific fields structure.relaxation_step, structure.relaxation_number, diff --git a/src/lematerial_fetcher/fetcher/alexandria/transform.py b/src/lematerial_fetcher/fetcher/alexandria/transform.py index 7feeb12..321850d 100644 --- a/src/lematerial_fetcher/fetcher/alexandria/transform.py +++ b/src/lematerial_fetcher/fetcher/alexandria/transform.py @@ -14,6 +14,15 @@ from lematerial_fetcher.utils.structure import get_optimade_from_pymatgen +def get_cross_compatibility(elements: list[str]) -> bool: + """ + Get the cross-compatibility of an Alexandria structure. + + Currently, Yb containing structures are not cross-compatible. + """ + return not any(element in ["Yb"] for element in elements) + + class AlexandriaTransformer(BaseTransformer): """ Alexandria transformer implementation. @@ -68,7 +77,8 @@ def transform_row( The transformed OptimadeStructure objects. If the list is empty, nothing from the structure should be included in the database. """ - key_mapping = { + + key_mapping_base = { "immutable_id": "immutable_id", "chemical_formula_reduced": "chemical_formula_reduced", "chemical_formula_anonymous": "chemical_formula_anonymous", @@ -84,36 +94,73 @@ def transform_row( "dimension_types": "dimension_types", "last_modified": "last_modified", "lattice_vectors": "lattice_vectors", + } + + key_mapping_functional = { + **key_mapping_base, "_alexandria_forces": "forces", "_alexandria_stress_tensor": "stress_tensor", "_alexandria_dos_ef": "dos_ef", "_alexandria_energy": "energy", + "_alexandria_energy_corrected": "energy_corrected", "_alexandria_magnetic_moments": "magnetic_moments", "_alexandria_magnetization": "total_magnetization", + "_alexandria_charges": "charges", + "_alexandria_band_gap": "band_gap_indirect", } - values_dict = {} - for key, value in key_mapping.items(): - values_dict[value] = raw_structure.attributes[key] + def get_structure_from_key_mapping( + key_mapping: dict[str, str], functional: Functional | None + ) -> OptimadeStructure: + values_dict = {} + for key, value in key_mapping.items(): + values_dict[value] = raw_structure.attributes[key] - optimade_structure = OptimadeStructure( - **values_dict, - id=raw_structure.id, # problem, this is empty - source="alexandria", - functional=self._alexandria_functional(raw_structure), - cross_compatibility=True, # All Alexandria structures have compatible parameters - ) + if functional is None: + functional = self._alexandria_functional(raw_structure) - return [optimade_structure] + optimade_structure = OptimadeStructure( + **values_dict, + id=f"{raw_structure.id}-{functional.value}", + source="alexandria", + functional=functional, + cross_compatibility=get_cross_compatibility(values_dict["elements"]), + compute_space_group=True, + compute_bawl_hash=True, + ) + + return optimade_structure + + structures = [get_structure_from_key_mapping(key_mapping_functional, None)] + + # Scan data is included with pbesol as fields with 'scan' prefix + if any("scan" in key for key in raw_structure.attributes): + key_mapping_scan = { + **key_mapping_base, + "_alexandria_scan_forces": "forces", + "_alexandria_scan_stress_tensor": "stress_tensor", + "_alexandria_scan_dos_ef": "dos_ef", + "_alexandria_scan_energy": "energy", + "_alexandria_scan_energy_corrected": "energy_corrected", + "_alexandria_scan_magnetic_moments": "magnetic_moments", + "_alexandria_scan_magnetization": "total_magnetization", + "_alexandria_scan_charges": "charges", + "_alexandria_scan_band_gap": "band_gap_indirect", + } + structures.append( + get_structure_from_key_mapping(key_mapping_scan, Functional.SCAN) + ) + + return structures def _alexandria_functional(self, raw_structure: RawStructure) -> Functional: """ Get the functional from the raw Alexandria structure. """ - if "pbe" in raw_structure.attributes["_alexandria_xc_functional"].lower(): - return Functional.PBE - elif "pbesol" in raw_structure.attributes["_alexandria_xc_functional"].lower(): + if "pbesol" in raw_structure.attributes["_alexandria_xc_functional"].lower(): return Functional.PBESOL + elif "pbe" in raw_structure.attributes["_alexandria_xc_functional"].lower(): + return Functional.PBE elif "scan" in raw_structure.attributes["_alexandria_xc_functional"].lower(): return Functional.SCAN else: @@ -192,6 +239,7 @@ def transform_row( trajectories = [] current_relaxation_number = 0 + energy_correction = None for relaxation_number, calc in enumerate(raw_structure.attributes): relaxation_steps = calc["steps"] for relaxation_step, relaxation_step_dict in enumerate(relaxation_steps): @@ -219,13 +267,26 @@ def transform_row( relaxation_number=relaxation_number, relaxation_step=current_relaxation_number, functional=Functional(calc["functional"].lower()), - cross_compatibility=True, + cross_compatibility=get_cross_compatibility( + optimade_structure_dict["elements"] + ), + energy_corrected=( + targets["energy"] + energy_correction + if targets["energy"] is not None + and energy_correction is not None + else None + ), + ) + energy_correction = ( + trajectory.energy_corrected - trajectory.energy + if trajectory.energy is not None + and trajectory.energy_corrected is not None + else None ) trajectories.append(trajectory) current_relaxation_number += 1 - if not has_trajectory_converged(trajectories): - return [] + trajectories = has_trajectory_converged(trajectories) return trajectories diff --git a/src/lematerial_fetcher/fetcher/catalysis_hub/fetch.py b/src/lematerial_fetcher/fetcher/catalysis_hub/fetch.py new file mode 100644 index 0000000..e69de29 diff --git a/src/lematerial_fetcher/fetcher/mp/fetch.py b/src/lematerial_fetcher/fetcher/mp/fetch.py index 421d9c5..7967855 100644 --- a/src/lematerial_fetcher/fetcher/mp/fetch.py +++ b/src/lematerial_fetcher/fetcher/mp/fetch.py @@ -66,7 +66,6 @@ def get_items_to_process(self) -> ItemsInfo: self.config.mp_bucket_name == "materialsproject-build" and self.config.mp_bucket_prefix in ["collections", "collections/"] ): - breakpoint() prefix = get_latest_collection_version_prefix( self.aws_client, self.config.mp_bucket_name, diff --git a/src/lematerial_fetcher/fetcher/mp/transform.py b/src/lematerial_fetcher/fetcher/mp/transform.py index 283cdb3..5ad4e64 100644 --- a/src/lematerial_fetcher/fetcher/mp/transform.py +++ b/src/lematerial_fetcher/fetcher/mp/transform.py @@ -10,12 +10,16 @@ TrajectoriesDatabase, ) from lematerial_fetcher.fetcher.mp.utils import ( - extract_structure_optimization_tasks, + extract_static_structure_optimization_tasks, map_tasks_to_functionals, ) from lematerial_fetcher.models.models import RawStructure from lematerial_fetcher.models.optimade import Functional, OptimadeStructure -from lematerial_fetcher.models.trajectories import Trajectory, has_trajectory_converged +from lematerial_fetcher.models.trajectories import ( + Trajectory, + close_to_primary_task, + has_trajectory_converged, +) from lematerial_fetcher.transform import BaseTransformer from lematerial_fetcher.utils.logging import logger @@ -67,12 +71,6 @@ def _transform_structure( pmg_structure = Structure.from_dict(mp_structure) - # TODO(ramlaoui): This does not handle with disordered structures - - species_at_sites = [str(site.specie) for site in pmg_structure.sites] - cartesian_site_positions = pmg_structure.cart_coords.tolist() - lattice_vectors = pmg_structure.lattice.matrix.tolist() - chemical_formula_reduced_dict = raw_structure.attributes["composition_reduced"] chemical_formula_reduced_elements = list(chemical_formula_reduced_dict.keys()) chemical_formula_reduced_ratios = list(chemical_formula_reduced_dict.values()) @@ -115,25 +113,25 @@ def _transform_structure( "elements": raw_structure.attributes["elements"], "nelements": raw_structure.attributes["nelements"], "elements_ratios": element_ratios, - # sites - "nsites": raw_structure.attributes["nsites"], - "cartesian_site_positions": cartesian_site_positions, - "species_at_sites": species_at_sites, - "species": species, # chemistry "chemical_formula_anonymous": raw_structure.attributes["formula_anonymous"], "chemical_formula_descriptive": str(pmg_structure.composition), "chemical_formula_reduced": chemical_formula_reduced, + "species": species, # dimensionality "dimension_types": [1, 1, 1], "nperiodic_dimensions": 3, - "lattice_vectors": lattice_vectors, } def _get_calc_targets(self, calc_output: dict[str, Any]) -> dict[str, Any]: """ - Get the targets of a calculation. + Get the targets of a calculation. These are extracted from a task and are then + either associated to a material or a trajectory. + These targets include: + - cartesian_site_positions + - species_at_sites + - nsites - energy - forces - stress tensor @@ -144,6 +142,7 @@ def _get_calc_targets(self, calc_output: dict[str, Any]) -> dict[str, Any]: ---------- calc_output : dict[str, Any] The output of an MP task calculation. + (task -> output) composition_reduced : dict[str, float] The composition of the material in reduced form. @@ -154,21 +153,36 @@ def _get_calc_targets(self, calc_output: dict[str, Any]) -> dict[str, Any]: """ targets = {} + + pmg_structure = Structure.from_dict(calc_output["structure"]) + targets["lattice_vectors"] = pmg_structure.lattice.matrix.tolist() + targets["cartesian_site_positions"] = pmg_structure.cart_coords.tolist() + # For some calculations, the unit cell contains less species than other for the same material ID + # So we need to determine them from the output structure of the calculation. + targets["species_at_sites"] = [str(site.specie) for site in pmg_structure.sites] + targets["nsites"] = len(targets["species_at_sites"]) + targets["energy"] = calc_output["energy"] + try: targets["magnetic_moments"] = [ site["properties"]["magmom"] for site in calc_output["structure"]["sites"] ] except (TypeError, KeyError): - logger.warning("No magnetic moments") targets["magnetic_moments"] = None - targets["forces"] = calc_output["ionic_steps"][-1]["forces"] + + targets["forces"] = calc_output["forces"] + targets["band_gap_indirect"] = calc_output["bandgap"] + # MP Charges are stored in an external file + targets["charges"] = None + # TODO(ramlaoui): Check if these are correct targets["dos_ef"] = calc_output.get("efermi", None) # dos_ef targets["total_magnetization"] = calc_output.get("magnetization", {}).get( "total_magnetization", None ) + try: targets["stress_tensor"] = calc_output["stress"] except KeyError: @@ -198,14 +212,16 @@ def _get_cross_compatibility_from_composition( cross_compatible = True non_compatible_elements = ["V", "Cs"] - # TODO(msiron): What about Yb? + # NB: We keep Yb for Materials Project since Yb_3 is now used for element in non_compatible_elements: if element in composition_reduced.keys(): cross_compatible = False return cross_compatible - def _get_ionic_step_targets(self, ionic_step: dict[str, Any]) -> dict[str, Any]: + def _get_ionic_step_targets( + self, ionic_step: dict[str, Any], NELM: int + ) -> dict[str, Any]: """ Get the targets of an ionic step. These targets include: @@ -217,6 +233,9 @@ def _get_ionic_step_targets(self, ionic_step: dict[str, Any]) -> dict[str, Any]: ---------- ionic_step : dict[str, Any] The ionic step to get the targets from. + NELM : int + The number of electronic steps as parameter of the task. + This is used to determine if the ionic step is converged. Returns ------- @@ -228,6 +247,17 @@ def _get_ionic_step_targets(self, ionic_step: dict[str, Any]) -> dict[str, Any]: targets["stress_tensor"] = ionic_step["stress"] targets["energy"] = ionic_step["e_fr_energy"] + pmg_structure = Structure.from_dict(ionic_step["structure"]) + targets["lattice_vectors"] = pmg_structure.lattice.matrix.tolist() + targets["cartesian_site_positions"] = pmg_structure.cart_coords.tolist() + targets["species_at_sites"] = [str(site.specie) for site in pmg_structure.sites] + targets["nsites"] = len(targets["species_at_sites"]) + + if NELM is not None and len(ionic_step["electronic_steps"]) == NELM: + raise ValueError( + f"Ionic step has {len(ionic_step['electronic_steps'])} electronic steps, expected {NELM}" + ) + return targets def _get_task_targets( @@ -255,15 +285,22 @@ def _get_task_targets( The target parameters of the task. """ try: - targets = self._get_calc_targets( - task.attributes["output"], task.attributes["composition_reduced"] - ) + targets = self._get_calc_targets(task.attributes["output"]) except KeyError as e: logger.warning( f"Error getting targets for {material_id} with functional {functional}: {e}" ) return {} + last_ionic_step = task.attributes["calcs_reversed"][-1]["output"][ + "ionic_steps" + ][-1] + NELM = task.attributes["input"]["parameters"]["NELM"] + if len(last_ionic_step["electronic_steps"]) == NELM: + raise ValueError( + f"Last ionic step has {len(last_ionic_step['electronic_steps'])} electronic steps, expected {NELM}" + ) + return targets @@ -299,10 +336,12 @@ def transform_row( The transformed OptimadeStructure objects. If the list is empty, nothing from the structure should be included in the database. """ - tasks, calc_types = extract_structure_optimization_tasks( + tasks, calc_types = extract_static_structure_optimization_tasks( raw_structure, source_db, task_table_name ) - functionals = map_tasks_to_functionals(tasks, calc_types) + functionals = map_tasks_to_functionals( + tasks, calc_types, keep_all_calculations=False + ) if not functionals: return [] @@ -324,7 +363,7 @@ def transform_row( for functional in functionals.keys(): targets = targets_functionals[functional] optimade_structure = OptimadeStructure( - id=f"{raw_structure.attributes['material_id']}-{functional}", + id=f"{raw_structure.attributes['material_id']}-{functional.value}", source="mp", # Basic fields immutable_id=raw_structure.attributes["material_id"], @@ -337,6 +376,8 @@ def transform_row( cross_compatibility=cross_compatibility, # targets **targets, + compute_space_group=True, + compute_bawl_hash=True, ) optimade_structures.append(optimade_structure) @@ -365,7 +406,11 @@ def __init__(self, *args, **kwargs): ) def transform_tasks( - self, task: RawStructure, functional: Functional, material_id: str + self, + task: RawStructure, + functional: Functional, + material_id: str, + trajectory_number: int = 0, ) -> list[Trajectory]: """ Transform a raw Materials Project structure into Trajectory objects. @@ -378,6 +423,8 @@ def transform_tasks( The functional to use for the transformation. material_id : str The material id of the task. + trajectory_number : int + The number of the trajectory to use for the transformation. Returns ------- @@ -388,39 +435,65 @@ def transform_tasks( trajectories = [] relaxation_step = 0 + energy_correction = None for i, calc in enumerate(task.attributes["calcs_reversed"]): # TODO(ramlaoui): What about this input? # input_structure_fields = self._transform_structure(raw_structure, calc["input"]["structure"]) # ionic steps are stored in normal order (first step first) + parameters = task.attributes["input"]["parameters"] + NELM = parameters["NELM"] if parameters is not None else None for ionic_step in calc["output"]["ionic_steps"]: - input_structure_fields = self._transform_structure( - task, ionic_step["structure"] - ) - output_targets = self._get_ionic_step_targets(ionic_step) - - cross_compatibility = self._get_cross_compatibility_from_composition( - task.attributes["composition_reduced"] - ) - - trajectory = Trajectory( - id=f"{material_id}-{functional.value}-{relaxation_step}", - source="mp", - immutable_id=material_id, - **input_structure_fields, - **output_targets, - functional=functional, - last_modified=task.attributes["last_updated"]["$date"], - relaxation_step=relaxation_step, - relaxation_number=i, - cross_compatibility=cross_compatibility, - ) - - trajectories.append(trajectory) + try: + input_structure_fields = self._transform_structure( + task, ionic_step["structure"] + ) + output_targets = self._get_ionic_step_targets(ionic_step, NELM) + + cross_compatibility = ( + self._get_cross_compatibility_from_composition( + task.attributes["composition_reduced"] + ) + ) + + trajectory = Trajectory( + # For one material_id, there can be multiple trajectories even for the same functional + # So we need to add a number to the trajectory id to differentiate them + id=f"{material_id}-{trajectory_number}-{functional.value}-{relaxation_step}", + source="mp", + immutable_id=f"{material_id}", + **input_structure_fields, + **output_targets, + functional=functional, + last_modified=task.attributes["last_updated"]["$date"], + relaxation_step=relaxation_step, + relaxation_number=i, + cross_compatibility=cross_compatibility, + energy_corrected=( + output_targets["energy"] + energy_correction + if output_targets["energy"] is not None + and energy_correction is not None + else None + ), + ) + # avoid having to recompute the energy correction + # for every snapshot of the trajectory + energy_correction = ( + trajectory.energy_corrected - trajectory.energy + if trajectory.energy is not None + and trajectory.energy_corrected is not None + else None + ) + + trajectories.append(trajectory) + except Exception as e: + logger.debug( + f"Error transforming step {relaxation_step} of with functional {functional.value}: {e}" + ) + continue relaxation_step += 1 - if not has_trajectory_converged(trajectories): - return [] + trajectories = has_trajectory_converged(trajectories) return trajectories @@ -451,11 +524,16 @@ def transform_row( list[Trajectory] The transformed Trajectory objects. """ - - tasks, calc_types = extract_structure_optimization_tasks( - raw_structure, source_db, task_table_name + tasks, calc_types = extract_static_structure_optimization_tasks( + raw_structure, + source_db, + task_table_name, + extract_static=False, + fallback_to_static=False, + ) + functionals = map_tasks_to_functionals( + tasks, calc_types, keep_all_calculations=True ) - functionals = map_tasks_to_functionals(tasks, calc_types) # Only keep tasks with a BY-C license license = raw_structure.attributes["builder_meta"]["license"] @@ -472,9 +550,19 @@ def transform_row( return [] trajectories = [] - for functional, task in functionals.items(): - trajectories.extend( - self.transform_tasks(task, functional, raw_structure.id) - ) + for functional, tasks_list in functionals.items(): + all_functional_trajectories = [ + self.transform_tasks( + task, functional, raw_structure.id, trajectory_number + ) + for trajectory_number, task in enumerate(tasks_list) + ] + if len(all_functional_trajectories) == 0: + continue + + trajectories.extend(all_functional_trajectories[0]) + for trajectory in all_functional_trajectories[1:]: + if close_to_primary_task(all_functional_trajectories[0], trajectory): + trajectories.extend(trajectory) return trajectories diff --git a/src/lematerial_fetcher/fetcher/mp/utils.py b/src/lematerial_fetcher/fetcher/mp/utils.py index 34e7bee..e2b1203 100644 --- a/src/lematerial_fetcher/fetcher/mp/utils.py +++ b/src/lematerial_fetcher/fetcher/mp/utils.py @@ -2,7 +2,6 @@ import gzip import json from collections import defaultdict -from datetime import datetime, timezone from enum import Enum from typing import Optional @@ -16,13 +15,15 @@ MP_FUNCTIONAL_MAPPING = { "GGA": Functional.PBE, "GGA+U": Functional.PBE, - "PBESol": Functional.PBESOL, + "PBEsol": Functional.PBESOL, + "r2SCAN": Functional.r2SCAN, "SCAN": Functional.SCAN, } class TaskType(Enum): STRUCTURE_OPTIMIZATION = "Structure Optimization" + STATIC = "Static" DEPRECATED = "Deprecated" @@ -125,13 +126,17 @@ def add_jsonl_file_to_db(gzipped_file, db: Database, log_every: int = 1000): logger.info(f"Completed processing {processed} records") -def extract_structure_optimization_tasks( - raw_structure: RawStructure, source_db: StructuresDatabase, task_table_name: str +def extract_static_structure_optimization_tasks( + raw_structure: RawStructure, + source_db: StructuresDatabase, + task_table_name: str, + extract_static: bool = True, + fallback_to_static: bool = False, ) -> tuple[dict[str, RawStructure], dict[str, str]]: """ - Extract non deprecated structure optimization tasks from a raw Materials Project structure. + Extract non deprecated structure optimization and static tasks from a raw Materials Project structure. - This function retrieves the structure optimization tasks from the task table + This function retrieves the structure optimization and static tasks from the task table and returns them as a list of OptimadeStructure objects. Parameters @@ -142,6 +147,10 @@ def extract_structure_optimization_tasks( The source database instance to read from. task_table_name : str The name of the task table to read from. + extract_static : bool + Whether to extract static tasks. + fallback_to_static : bool + Whether to fallback to static tasks if no structure optimization tasks are found. Returns ------- @@ -150,13 +159,16 @@ def extract_structure_optimization_tasks( - The first dictionary maps task IDs to RawStructure objects. - The second dictionary maps task IDs to the calculation type. """ + include_list = [TaskType.STRUCTURE_OPTIMIZATION.value] + if extract_static: + include_list.append(TaskType.STATIC.value) # This means that the raw structure is a material if "task_types" in raw_structure.attributes: - structure_optimization_tasks = [ + static_and_structure_optimization_tasks = [ mp_id for mp_id, task_type in raw_structure.attributes["task_types"].items() - if task_type == TaskType.STRUCTURE_OPTIMIZATION.value + if task_type in include_list ] else: raise ValueError( @@ -167,9 +179,19 @@ def extract_structure_optimization_tasks( non_deprecated_task_ids = [ mp_id - for mp_id in structure_optimization_tasks + for mp_id in static_and_structure_optimization_tasks if mp_id not in raw_structure.attributes["deprecated_tasks"] ] + + # If no non-deprecated tasks are found, fallback to static tasks + if not non_deprecated_task_ids and fallback_to_static: + non_deprecated_task_ids = [ + mp_id + for mp_id, task_type in raw_structure.attributes["task_types"].items() + if mp_id not in raw_structure.attributes["deprecated_tasks"] + and task_type == TaskType.STATIC.value + ] + calc_types = { mp_id: raw_structure.attributes["calc_types"][mp_id] for mp_id in non_deprecated_task_ids @@ -202,7 +224,7 @@ def map_task_to_functional( if task_calc_type is None: task_calc_type = task.attributes["calc_type"] - functional = task_calc_type.split(" " + TaskType.STRUCTURE_OPTIMIZATION.value)[0] + functional = task_calc_type.split(" ")[0] # Extracts the functional if functional in MP_FUNCTIONAL_MAPPING: return MP_FUNCTIONAL_MAPPING[functional] else: @@ -210,28 +232,44 @@ def map_task_to_functional( def map_tasks_to_functionals( - tasks: list[RawStructure], task_calc_types: dict[str, str] -) -> dict[str, RawStructure]: + tasks: list[RawStructure], + task_calc_types: dict[str, str], + keep_all_calculations: bool = False, +) -> dict[str, RawStructure | list[RawStructure]]: """ Map tasks to functionals, selecting the most appropriate task for each functional. - For most functionals, the most recent task is selected. - For PBE, GGA+U is preferred over GGA regardless of date. + We follow the Materials Project strategy for selecting the most appropriate + task for each functional [1]: + For most functionals, we + - Only include non-deprecated tasks (valid calculations) + - Prefer a static calculation over a structure optimization + - We pick the structure with the lowest energy output + For PBE, GGA+U is preferred over GGA regardless of energy value. Parameters ---------- tasks : List[RawStructure] List of task structures to process + task_calc_types : dict[str, str] + Dictionary mapping task IDs to calculation types + keep_all_calculations : bool + Whether to keep all calculations or only the most appropriate one + per material. This is useful for extracting trajectories. Returns ------- Dict[str, RawStructure] Dictionary mapping functional names to selected task + + References + ---------- + [1] https://github.com/materialsproject/emmet/blob/682277da9f11af40073d5a4fa6b306fda9a1d582/emmet-core/emmet/core/vasp/material.py#L109 """ functional_tasks = defaultdict(list) for task_id, calc_type in task_calc_types.items(): - functional = calc_type.split(" " + TaskType.STRUCTURE_OPTIMIZATION.value)[0] + functional = calc_type.split(" ")[0] # Extracts the functional if task_id not in tasks: logger.warning( f"Task {task_id} was not found in your tasks databases, " @@ -248,54 +286,35 @@ def map_tasks_to_functionals( ) # For PBE, prefer GGA+U over GGA + # Except for trajectories, where we take both + # and let the filtering decide which steps to keep if "GGA+U" in functional_tasks: - functional_tasks[Functional.PBE] = functional_tasks["GGA+U"] + if keep_all_calculations: + functional_tasks[Functional.PBE].extend(functional_tasks["GGA+U"]) + else: + functional_tasks[Functional.PBE] = functional_tasks["GGA+U"] - selected_tasks = {} + def _static_lowest_energy(task: RawStructure) -> RawStructure: + parameters = task.attributes["input"]["parameters"] - for functional, task_list in functional_tasks.items(): - selected_task = select_most_recent_task(task_list) - - if selected_task: - selected_tasks[functional] = selected_task - - return selected_tasks - - -def select_most_recent_task(tasks: list[RawStructure]) -> Optional[RawStructure]: - """ - Select the most recent task from a list of tasks. - - Parameters - ---------- - tasks : List[RawStructure] - List of tasks to choose from + tags_score = sum( + (parameters.get(tag, False) if parameters else False) + for tag in ["LASPH", "ISPIN"] + ) - Returns - ------- - Optional[RawStructure] - The most recent task, or None if no valid tasks - """ - if not tasks: - return None + return ( + -int(task.attributes["task_type"] == TaskType.STATIC.value), + -tags_score, + task.attributes["output"]["energy"] / task.attributes["nsites"], + ) - latest_task = None - latest_date = datetime.min.replace(tzinfo=timezone.utc) + selected_tasks = {} + for functional, task_list in functional_tasks.items(): + sorted_tasks = sorted(task_list, key=_static_lowest_energy) - for task in tasks: - # Extract the completion date from task attributes - date_info = task.attributes.get("last_updated", {}) - date_str = date_info.get("$date", "") - if not date_str: - continue + if keep_all_calculations: + selected_tasks[functional] = sorted_tasks + else: + selected_tasks[functional] = sorted_tasks[0] - try: - # Parse the date string from format: '2016-09-16T06:29:25Z' - task_date = datetime.fromisoformat(date_str.replace("Z", "+00:00")) - if task_date > latest_date: - latest_date = task_date - latest_task = task - except (ValueError, TypeError): - logger.warning(f"Could not parse date '{date_str}' for task {task.id}") - - return latest_task if latest_task else None + return selected_tasks diff --git a/src/lematerial_fetcher/fetcher/oqmd/transform.py b/src/lematerial_fetcher/fetcher/oqmd/transform.py index 7453186..4531ead 100644 --- a/src/lematerial_fetcher/fetcher/oqmd/transform.py +++ b/src/lematerial_fetcher/fetcher/oqmd/transform.py @@ -343,9 +343,15 @@ def _get_calculations( raw_structure["id"]: raw_structure[entry_id_key] for raw_structure in raw_structures } - entry_ids = list(structure_id_to_entry_id.values()) + entry_ids = [ + str(entry_id) + for entry_id in structure_id_to_entry_id.values() + if entry_id is not None + ] # Get a list of all the calculations for the entry_ids - custom_query = f"SELECT * FROM calculations WHERE entry_id IN ({', '.join(map(str, entry_ids))})" + custom_query = ( + f"SELECT * FROM calculations WHERE entry_id IN ({', '.join(entry_ids)})" + ) fetched_calculations = source_db.fetch_items(query=custom_query) # We need to group the calculations by entry_id because different structures can have the same entry_id @@ -356,6 +362,10 @@ def _get_calculations( # Group the calculations by structure_id calculations = defaultdict(list) for structure_id in structure_id_to_entry_id.keys(): + # The structure has no entry_id, so we skip it + if structure_id_to_entry_id[structure_id] is None: + continue + calculations[structure_id] = calculations_by_entry_id[ structure_id_to_entry_id[structure_id] ] @@ -432,6 +442,9 @@ def _extract_atoms_attributes( forces.append([atom["fx"], atom["fy"], atom["fz"]]) charges.append(atom["charge"]) + if any(charge is None for charge in charges): + charges = None + if any(any(f is None for f in force) for force in forces): forces = None @@ -536,6 +549,9 @@ def transform_row( for raw_structure, structure_id in zip( raw_structures, calculations_dict.keys() ): + if structure_id not in calculations_dict: + continue + calculations = calculations_dict[structure_id] values_dict = values_dict_dict[structure_id] @@ -544,14 +560,19 @@ def transform_row( continue static_calculation = calculations[0] + if static_calculation["energy_pa"] is None: + logger.warning( + f"No energy_pa found for structure {structure_id}, skipping" + ) + continue + values_dict["energy"] = ( static_calculation["energy_pa"] * values_dict["nsites"] ) - # TODO(msiron): Agree on band gap - # values_dict["band_gap_indirect"] = static_calculation["band_gap"] + values_dict["band_gap_indirect"] = static_calculation["band_gap"] species_at_sites, frac_coords, forces, charges = ( - self._extract_atoms_attributes(atoms) + self._extract_atoms_attributes(atoms[structure_id]) ) structure = Structure( species=species_at_sites, @@ -582,7 +603,7 @@ def transform_row( # Compatibility of the DFT settings # dict from string to dict settings = ast.literal_eval(static_calculation["settings"]) - if settings["ispin"] in ["2", 2]: + if settings.get("ispin", None) in ["2", 2]: values_dict["cross_compatibility"] = True else: values_dict["cross_compatibility"] = False @@ -593,14 +614,19 @@ def transform_row( # TODO(Ramlaoui): Do we just want to skip the structure or set cross_compatibility to False? values_dict["cross_compatibility"] = False - optimade_structure = OptimadeStructure( - **values_dict, - id=values_dict["immutable_id"], - source="oqmd", - # Couldn't find a way to get the last modified date from the source database - last_modified=datetime.now().isoformat(), - functional=Functional.PBE, - ) + try: + optimade_structure = OptimadeStructure( + **values_dict, + id=f"{values_dict['immutable_id']}-{Functional.PBE.value}", + source="oqmd", + # Couldn't find a way to get the last modified date from the source database + last_modified=datetime.now().isoformat(), + compute_space_group=True, + compute_bawl_hash=True, + ) + except Exception as e: + logger.warning(f"Error transforming structure {structure_id}: {e}") + continue optimade_structures.append(optimade_structure) return optimade_structures @@ -749,7 +775,6 @@ def get_values_dict_dict_from_structure_id( for entry_id, calculations in calculations_dict.items(): if len(calculations) == 0: - logger.warning(f"No calculations found for entry {entry_id}") continue if entry_id in entry_id_to_ignore: @@ -762,6 +787,7 @@ def get_values_dict_dict_from_structure_id( continue entry_trajectories = [] + energy_correction = None # The id of the trajectory will be the id of the final output structure # similar to how it is done with MP where we use the materials_id to name # the trajectory. @@ -803,17 +829,28 @@ def get_values_dict_dict_from_structure_id( output_relaxation_step = current_relaxation_step + calculation["nsteps"] output_values_dict["immutable_id"] = trajectory_immutable_id - entry_trajectories.append( - Trajectory( - id=f"{trajectory_immutable_id}-{Functional.PBE.value}-{output_relaxation_step}", - source="oqmd", - last_modified=datetime.now().isoformat(), # not available for OQMD - relaxation_number=current_relaxation_number, - relaxation_step=output_relaxation_step, - cross_compatibility=cross_compatibility, - **output_values_dict, - ) + current_trajectory = Trajectory( + id=f"{trajectory_immutable_id}-{Functional.PBE.value}-{output_relaxation_step}", + source="oqmd", + last_modified=datetime.now().isoformat(), # not available for OQMD + relaxation_number=current_relaxation_number, + relaxation_step=output_relaxation_step, + cross_compatibility=cross_compatibility, + **output_values_dict, + energy_corrected=( + output_values_dict["energy"] + energy_correction + if output_values_dict["energy"] is not None + and energy_correction is not None + else None + ), + ) + energy_correction = ( + current_trajectory.energy_corrected - current_trajectory.energy + if current_trajectory.energy is not None + and current_trajectory.energy_corrected is not None + else None ) + entry_trajectories.append(current_trajectory) # TODO(Ramlaoui): No relaxation sometimes current_relaxation_step += calculation["nsteps"] @@ -823,13 +860,7 @@ def get_values_dict_dict_from_structure_id( logger.warning(f"Entry {entry_id} did not converge, skipping") continue - # We only check that the forces in the last step are not small - # because we don't have all the steps - # Energy is None in the first step usually, but we add the structure - # regardless because it might be useful for IS2RE/S tasks - if not has_trajectory_converged(entry_trajectories, energy_threshold=None): - continue - + entry_trajectories = has_trajectory_converged(entry_trajectories) trajectories.extend(entry_trajectories) return trajectories diff --git a/src/lematerial_fetcher/models/optimade.py b/src/lematerial_fetcher/models/optimade.py index 33f504d..ac07685 100644 --- a/src/lematerial_fetcher/models/optimade.py +++ b/src/lematerial_fetcher/models/optimade.py @@ -2,16 +2,26 @@ import datetime import math import re -from enum import Enum +import warnings from typing import Optional +import moyopy +import numpy as np +from material_hasher.hasher.bawl import BAWLHasher +from moyopy.interface import MoyoAdapter from pydantic import BaseModel, Field, field_validator, model_validator +from pymatgen.core import Element, Structure +from lematerial_fetcher.models.utils.correction import apply_mp_2020_energy_correction +from lematerial_fetcher.models.utils.enums import Functional, Source +from lematerial_fetcher.utils.logging import logger -class Functional(str, Enum): - PBE = "pbe" - PBESOL = "pbesol" - SCAN = "scan" +# TODO(Ramlaoui, msiron): Take care of warnings in the hasher +warnings.filterwarnings("ignore") + +SG_MOYOPY_SYMPREC = 1e-4 + +MAX_FORCE_EV_A = 0.1 # eV/Å class OptimadeStructure(BaseModel): @@ -26,7 +36,7 @@ class OptimadeStructure(BaseModel): min_length=1, description="Unique identifier for the structure", ) - source: str = Field( + source: Source = Field( ..., min_length=1, description="Source database of the structure", @@ -129,6 +139,10 @@ class OptimadeStructure(BaseModel): None, description="Total energy in eV", ) + energy_corrected: Optional[float] = Field( + None, + description="Corrected energy in eV", + ) magnetic_moments: Optional[list[float]] = Field( None, min_length=1, @@ -147,25 +161,77 @@ class OptimadeStructure(BaseModel): None, description="Density of states at Fermi level", ) + charges: Optional[list[float]] = Field( + None, + min_length=1, + description="Charges on each site", + ) + band_gap_indirect: Optional[float] = Field( + None, + description="Indirect band gap in eV", + ) functional: Optional[Functional] = Field( None, description="Exchange-correlation functional" ) cross_compatibility: bool = Field(description="Cross-compatibility flag") - entalpic_fingerprint: Optional[str] = Field( + space_group_it_number: Optional[int] = Field( + None, + description="Space group international number", + ) + bawl_fingerprint: Optional[str] = Field( None, min_length=1, - description="Entalpic fingerprint hash", + description="BAWL fingerprint hash", ) + def __init__( + self, + compute_space_group: bool = True, + compute_bawl_hash: bool = False, + **kwargs, + ): + try: + structure = Structure( + species=kwargs["species_at_sites"], + coords=kwargs["cartesian_site_positions"], + lattice=kwargs["lattice_vectors"], + coords_are_cartesian=True, + ) + + # Compute space group with moyopy + if compute_space_group: + cell = MoyoAdapter.from_structure(structure) + dataset = moyopy.MoyoDataset( + cell=cell, + symprec=SG_MOYOPY_SYMPREC, + angle_tolerance=None, + setting=None, + ) + space_group = dataset.number + kwargs["space_group_it_number"] = space_group + + if compute_bawl_hash: + kwargs["bawl_fingerprint"] = BAWLHasher().get_material_hash(structure) + + except Exception as e: + logger.warning( + f"Failed to create pymatgen structure from {kwargs['immutable_id']}. Error: {e}" + ) + + super().__init__(**kwargs) + # # Field-level validators # - def _validate_with_number_of_sites(self, v, nsites): + def _validate_with_number_of_sites(self, v, nsites, field_name=""): if v is None: return v if len(v) != nsites: - raise ValueError(f"List must have exactly {nsites} items") + raise ValueError( + f"List {field_name} must have exactly {nsites} items. " + f"Got {len(v)} items. Input value: {v}" + ) return v @field_validator("cartesian_site_positions", "forces", mode="before") @@ -175,10 +241,16 @@ def validate_3d_vector(cls, v): if v is None: return v if any(len(row) != 3 for row in v): - raise ValueError("Vector must have exactly 3 components") + invalid_rows = [i for i, row in enumerate(v) if len(row) != 3] + raise ValueError( + f"Each vector must have exactly 3 components. Found vectors with wrong dimensions at indices: {invalid_rows}. " + f"Expected format: [[x, y, z], ...], got: {v}" + ) return v except Exception as e: - raise ValueError(f"Invalid vector format: {e}") from e + raise ValueError( + f"Invalid vector format: {str(e)}. Input value: {v}" + ) from e @field_validator("stress_tensor", "lattice_vectors", mode="before") @classmethod @@ -186,7 +258,22 @@ def validate_3x3_matrix(cls, v): if v is None: return v if len(v) != 3 or any(len(row) != 3 for row in v): - raise ValueError("Matrix must be a 3x3 matrix") + raise ValueError( + f"Matrix must be a 3x3 matrix. Got shape {len(v)}x{len(v[0]) if v else 0}. " + f"Input value: {v}" + ) + return v + + @field_validator("species_at_sites") + @classmethod + def validate_species_at_sites(cls, v): + """ + Ensure that the species contain only valid elements. + """ + if any(not Element.is_valid_symbol(element) for element in v): + raise ValueError( + f"Field species_at_sites must contain only valid elements. Got: {v}" + ) return v @field_validator("elements_ratios") @@ -198,7 +285,8 @@ def validate_sum_of_elements_ratios(cls, v): ratio_sum = sum(v) if not math.isclose(ratio_sum, 1.0, rel_tol=1e-5, abs_tol=1e-8): raise ValueError( - f"Sum of elements_ratios must be 1.0 (got {ratio_sum:.6f}). Each ratio represents the fraction of each element in the structure." + f"Sum of elements_ratios must be 1.0 (got {ratio_sum:.6f}). " + f"Current ratios: {v}. Each ratio represents the fraction of each element in the structure." ) return v @@ -210,7 +298,10 @@ def validate_elements_order(cls, v): """ if v != sorted(v): raise ValueError( - f"Elements must be in alphabetical order. Current order: {', '.join(v)}, Expected order: {', '.join(sorted(v))}" + f"Elements must be in alphabetical order. " + f"Current order: {', '.join(v)}, " + f"Expected order: {', '.join(sorted(v))}. " + f"Please reorder the elements list." ) return v @@ -225,8 +316,9 @@ def validate_and_reorder_anonymous_formula(cls, v: str) -> str: pattern = r"^[A-Z](?:\d+)?(?:[A-Z](?:\d+)?)*$" if not re.match(pattern, v): raise ValueError( - "Anonymous formula must consist of capital letters with optional numbers. " - f"Got: {v}" + "Invalid anonymous formula format. " + "Formula must consist of capital letters with optional numbers (e.g., A2B3C). " + f"Got: '{v}'. Please check for invalid characters or format." ) # extract letter-number pairs @@ -255,14 +347,15 @@ def validate_chemical_formula_descriptive(cls, v: str) -> str: # Remove trailing numbers v = re.sub(r"([A-Z][a-z]?)1\b", r"\1", v) - # validate format (single uppercase letter followed by optional number) pattern = re.compile( r"^(?:[A-Z][a-z]?(?:[2-9]\d*|1\d+)?)(?:\s+[A-Z][a-z]?(?:[2-9]\d*|1\d+)?)*$" ) if not pattern.match(v): raise ValueError( - "Chemical formula descriptive must consist of capital letters with optional numbers. " - f"Got: {v}" + "Invalid descriptive formula format. " + "Formula must consist of element symbols (capital letter + optional lowercase) " + "with optional numbers, separated by spaces. " + f"Got: '{v}'. Example of valid format: 'H2 O' or 'Fe2 O3'" ) return v @@ -277,21 +370,26 @@ def validate_chemical_formula_reduced(cls, v: str) -> str: # Check for parentheses if "(" in v or ")" in v: raise ValueError( - f"Chemical formula reduced must not contain parentheses. Got: {v}" + f"Chemical formula reduced must not contain parentheses. Got: '{v}'. " + "Please remove all parentheses from the formula." ) # Check for any "1" in the formula (not just trailing ones) if re.search(r"([A-Z][a-z]?)1(?!\d)", v): + matches = re.finditer(r"([A-Z][a-z]?)1(?!\d)", v) + problematic_elements = [m.group(1) for m in matches] raise ValueError( - f"Chemical formula reduced must not have ones (like Cs1O4). Got: {v}" + f"Chemical formula reduced must not have ones (e.g., {', '.join(problematic_elements)}1). " + f"Got: '{v}'. Remove the '1' subscripts or use proper stoichiometric numbers." ) # Validate format (element symbols followed by optional numbers) pattern = re.compile(r"^(?:[A-Z][a-z]?(?:\d+)?)+$") if not pattern.match(v): raise ValueError( - "Chemical formula reduced must consist of element symbols followed by optional numbers (no trailing ones). " - f"Got: {v}" + "Invalid reduced formula format. " + "Formula must consist of element symbols followed by optional numbers. " + f"Got: '{v}'. Example of valid format: 'Fe2O3' or 'NaCl'" ) return v @@ -309,10 +407,64 @@ def validate_date_format(cls, v: datetime.datetime) -> datetime.datetime: return datetime.datetime.strptime(formatted, "%Y-%m-%d") except (ValueError, AttributeError) as e: raise ValueError( - "last_modified must be in format 'YYYY-MM-DD'. " - f"Got: {v}. Error: {str(e)}" + "Invalid date format for last_modified. " + f"Got: {v}. Expected format: 'YYYY-MM-DD'. " + f"Error details: {str(e)}" ) from e + @field_validator("space_group_it_number") + @classmethod + def validate_space_group_it_number(cls, v: int) -> int: + """ + Ensure the space group IT number is properly formatted. + """ + if v is None: + return v + if v < 1 or v > 230: + raise ValueError( + f"Space group IT number must be between 1 and 230. Got: {v}" + ) + return v + + @field_validator("dimension_types") + @classmethod + def validate_dimension_types(cls, v: list[int]) -> list[int]: + """ + Ensure the dimension types are properly formatted. + + We should expect it to be [1, 1, 1] for any structure. + """ + if v != [1, 1, 1]: + raise ValueError(f"Field dimension_types must be [1, 1, 1]. Got: {v}") + return v + + @field_validator("nperiodic_dimensions") + @classmethod + def validate_nperiodic_dimensions(cls, v: int) -> int: + """ + Ensure the number of periodic dimensions is 3. + """ + if v != 3: + raise ValueError(f"Field nperiodic_dimensions must be 3. Got: {v}") + return v + + @field_validator("forces") + @classmethod + def validate_forces_too_high( + cls, v: list[list[float]] | None + ) -> list[list[float]] | None: + """ + Ensure the forces are not too high. + """ + if v is None: + return v + max_force = max(np.linalg.norm(force) for force in v) + if max_force > MAX_FORCE_EV_A: + raise ValueError( + f"Forces are too high. Maximum allowed force is {MAX_FORCE_EV_A} eV/Å. Got: {max_force}" + ) + return v + # # Cross-field validators # @@ -328,13 +480,17 @@ def check_consistency(self): nsites = self.nsites # Check elements and ratios consistency - if len(elements) != len(elements_ratios): + if not ( + len(elements) + == len(elements_ratios) + == nelements + == len(self.species) + == len(self.chemical_formula_descriptive.split()) + ): raise ValueError( - f"Number of elements ({len(elements)}) must match number of element ratios ({len(elements_ratios)})" - ) - if nelements != len(elements): - raise ValueError( - f"nelements ({nelements}) must match the number of unique elements ({len(elements)})" + f"Number of elements ({len(elements)}) must match number of element ratios ({len(elements_ratios)}), " + f"nelements ({nelements}), species ({len(self.species)}) and chemical formula descriptive " + f"({len(self.chemical_formula_descriptive.split())})" ) # Realign elements and ratios (maintaining alphabetical order) @@ -345,14 +501,34 @@ def check_consistency(self): # Check nsites consistency self.cartesian_site_positions = self._validate_with_number_of_sites( - self.cartesian_site_positions, nsites + self.cartesian_site_positions, nsites, "cartesian_site_positions" ) self.species_at_sites = self._validate_with_number_of_sites( - self.species_at_sites, nsites + self.species_at_sites, nsites, "species_at_sites" ) - self.forces = self._validate_with_number_of_sites(self.forces, nsites) + self.forces = self._validate_with_number_of_sites(self.forces, nsites, "forces") self.magnetic_moments = self._validate_with_number_of_sites( - self.magnetic_moments, nsites + self.magnetic_moments, nsites, "magnetic_moments" + ) + self.charges = self._validate_with_number_of_sites( + self.charges, nsites, "charges" + ) + + # Validation using the Pymatgen structure + structure = Structure( + self.lattice_vectors, + self.species_at_sites, + self.cartesian_site_positions, + coords_are_cartesian=True, ) + # Apply the energy correction + if self.energy_corrected is None: + self.energy_corrected = apply_mp_2020_energy_correction( + structure, self.energy, self.functional, self.source + ) + return self + + +# TODO(Ramlaoui): Check that rows with MP match the API diff --git a/src/lematerial_fetcher/models/trajectories.py b/src/lematerial_fetcher/models/trajectories.py index b68d67a..52b5c12 100644 --- a/src/lematerial_fetcher/models/trajectories.py +++ b/src/lematerial_fetcher/models/trajectories.py @@ -1,12 +1,13 @@ # Copyright 2025 Entalpic import numpy as np -from pydantic import Field, model_validator +from pydantic import Field, field_validator, model_validator from lematerial_fetcher.models.optimade import OptimadeStructure from lematerial_fetcher.utils.logging import logger -ENERGY_CONVERGENCE_THRESHOLD = 2e-2 # MPtrj default -FORCE_CONVERGENCE_THRESHOLD = 0.2 +# MPtrj defaults +ENERGY_CONVERGENCE_THRESHOLD = 2e-2 # Difference with the primary MP task +MAX_ENERGY_DIFF = 1 # 1 eV class Trajectory(OptimadeStructure): @@ -15,6 +16,14 @@ class Trajectory(OptimadeStructure): ..., description="Relaxation number of the trajectory" ) + @field_validator("forces") + @classmethod + def validate_forces_too_high( + cls, v: list[list[float]] | None + ) -> list[list[float]] | None: + """Override the parent class validator to avoid checking forces.""" + return v + @model_validator(mode="after") def validate_relaxation_trajectories(self): relaxation_number = np.array(self.relaxation_number) @@ -28,50 +37,101 @@ def validate_relaxation_trajectories(self): return self +def close_to_primary_task( + primary_trajectories: list[Trajectory], trajectories: list[Trajectory] +) -> bool: + """ + This guarantees that the final structure's energy is close to the primary + trajectory's final structure's energy. + + This is used for MP to only keep the most appropriate trajectories for + a given material. + + Parameters + ---------- + primary_trajectories : list[Trajectory] + The primary trajectories. + trajectories : list[Trajectory] + The trajectories to check. + + Returns + ------- + bool + True if the trajectory is close to the primary trajectory, False otherwise. + """ + if len(trajectories) == 0 or len(primary_trajectories) == 0: + return False + + if trajectories[-1].energy is None or primary_trajectories[-1].energy is None: + return False + + energy_diff = np.abs( + trajectories[-1].energy / trajectories[-1].nsites + - primary_trajectories[-1].energy / primary_trajectories[-1].nsites + ) + if energy_diff <= ENERGY_CONVERGENCE_THRESHOLD: + return True + + logger.debug( + f"Trajectory {trajectories[-1].id} has energy difference: {energy_diff:.4f} eV" + ) + return False + + def has_trajectory_converged( trajectories: list[Trajectory], - energy_threshold: float | None = ENERGY_CONVERGENCE_THRESHOLD, - force_threshold: float | None = FORCE_CONVERGENCE_THRESHOLD, + max_energy_diff: float | None = MAX_ENERGY_DIFF, ) -> bool: """ Check if the full trajectory has converged. - This also excludes trajectories where no last step has no forces - or energy. + This also excludes trajectories where forces or energy are not available. Parameters ---------- trajectories : list[Trajectory] The trajectories to check. + max_energy_diff : float | None + The maximum energy difference between a structure and the last structure + in the trajectory. Returns ------- bool True if the trajectory has converged, False otherwise. """ - # If the last step has no energy or forces, we cannot check for convergence - # and we don't want the trajectory to be pushed - if trajectories[-1].energy is None or trajectories[-1].forces is None: - logger.warning( - f"Trajectory {trajectories[-1].id} has no energy or forces, skipping" - ) - return False + filtered_trajectories = [] - if energy_threshold is not None and len(trajectories) > 1: - if np.abs(trajectories[-1].energy - trajectories[-2].energy) > energy_threshold: - logger.warning( - f"Trajectory {trajectories[-1].id} has not converged, energy difference: {np.abs(trajectories[-1].energy - trajectories[-2].energy):.2f} eV" - ) - return False - - if force_threshold is not None: - if ( - np.linalg.norm(np.array(trajectories[-1].forces), axis=1).max() - > force_threshold - ): - logger.warning( - f"Trajectory {trajectories[-1].id} has not converged, max force norm: {np.linalg.norm(np.array(trajectories[-1].forces), axis=1).max():.2f} eV/A" + for i, trajectory in enumerate(trajectories): + if trajectory.energy is None or trajectory.forces is None: + logger.debug( + f"Trajectory {trajectory.id} has no energy or forces, skipping" ) - return False + continue + filtered_trajectories.append(trajectory) + + trajectories = filtered_trajectories + + if len(trajectories) == 0: + return [] - return True + final_trajectory = trajectories[-1] + + filtered_trajectories = [] + + for i, trajectory in enumerate(trajectories): + if i != len(trajectories) - 1: + energy_diff = ( + trajectory.energy / trajectory.nsites + - final_trajectory.energy / final_trajectory.nsites + ) + if ( + energy_diff <= max_energy_diff + ): # check if frame has energy higher than 1eV/atom + filtered_trajectories.append(trajectory) + else: + logger.debug( + f"Trajectory {trajectory.id} has not converged, energy difference: {energy_diff:.4f} eV" + ) + + return filtered_trajectories diff --git a/src/lematerial_fetcher/models/utils/correction.py b/src/lematerial_fetcher/models/utils/correction.py new file mode 100644 index 0000000..d7cad26 --- /dev/null +++ b/src/lematerial_fetcher/models/utils/correction.py @@ -0,0 +1,95 @@ +# Copyright 2025 Entalpic +import json +import os + +from pymatgen.core import Structure +from pymatgen.entries.compatibility import ( + ComputedStructureEntry, + MaterialsProject2020Compatibility, +) + +from lematerial_fetcher.models.utils.enums import Functional, Source +from lematerial_fetcher.utils.logging import logger + +MPC = MaterialsProject2020Compatibility() + + +POTCAR_INFO = json.load(open(os.path.join(os.path.dirname(__file__), "potcar.json"))) +U_VALUES = { + "Co": 3.32, + "Cr": 3.7, + "Fe": 5.3, + "Mn": 3.9, + "Mo": 4.38, + "Ni": 6.2, + "V": 3.25, + "W": 6.2, +} + + +def apply_mp_2020_energy_correction( + structure: Structure, + energy: float | None, + functional: Functional, + source: Source, +) -> float | None: + """ + Apply the MP 2020 energy correction to the energy. + + Parameters + ---------- + structure : Structure + The structure to apply the correction to. + energy : float | None + The energy to apply the correction to. + functional : Functional + The functional to use for the correction. + source : Source + The source of the structure. + + Returns + ------- + float | None + The corrected energy. + """ + + if energy is None or functional != Functional.PBE: + return energy + + elements = [e.name for e in structure.composition.elements] + + if any(element in ["Po", "At"] for element in elements): + return None + + if source in [Source.MP, Source.OQMD] and "V" in elements: + return None + + # Check if the structure contains O or F to use the correct U value + hubbards = None + if any(element in ["O", "F"] for element in elements): + hubbards = {k: v for k, v in U_VALUES.items() if k in elements} + + potcar_sym = [ + POTCAR_INFO[element] + for element in (set(elements) - set("V")) + if element in POTCAR_INFO + ] + + if source == Source.ALEXANDRIA and "V" in elements: + potcar_sym.append("PAW_PBE V_sv 07Sep2000") + + try: + cse = ComputedStructureEntry( + structure, + energy, + parameters={ + "run_type": "GGA", + "hubbards": hubbards, + "potcar_symbols": potcar_sym, + }, + ) + processed_cse = MPC.process_entry(cse) + return processed_cse.energy if processed_cse else None + except Exception as e: + logger.warning(f"Failed to apply MP 2020 energy correction: {e}") + return None diff --git a/src/lematerial_fetcher/models/utils/enums.py b/src/lematerial_fetcher/models/utils/enums.py new file mode 100644 index 0000000..7525b5a --- /dev/null +++ b/src/lematerial_fetcher/models/utils/enums.py @@ -0,0 +1,15 @@ +# Copyright 2025 Entalpic +from enum import Enum + + +class Functional(str, Enum): + PBE = "pbe" + PBESOL = "pbesol" + SCAN = "scan" + r2SCAN = "r2scan" + + +class Source(str, Enum): + ALEXANDRIA = "alexandria" + MP = "mp" + OQMD = "oqmd" diff --git a/src/lematerial_fetcher/models/utils/potcar.json b/src/lematerial_fetcher/models/utils/potcar.json new file mode 100644 index 0000000..fa19f95 --- /dev/null +++ b/src/lematerial_fetcher/models/utils/potcar.json @@ -0,0 +1,90 @@ +{ + "Ac": "PAW_PBE Ac 06Sep2000", + "Ag": "PAW_PBE Ag 06Sep2000", + "Al": "PAW_PBE Al 04Jan2001", + "Ar": "PAW_PBE Ar 07Sep2000", + "As": "PAW_PBE As 06Sep2000", + "Au": "PAW_PBE Au 06Sep2000", + "Ba": "PAW_PBE Ba_sv 06Sep2000", + "Be": "PAW_PBE Be_sv 06Sep2000", + "Bi": "PAW_PBE Bi 08Apr2002", + "B": "PAW_PBE B 06Sep2000", + "Br": "PAW_PBE Br 06Sep2000", + "Ca": "PAW_PBE Ca_sv 06Sep2000", + "Cd": "PAW_PBE Cd 06Sep2000", + "Ce": "PAW_PBE Ce 28Sep2000", + "Cl": "PAW_PBE Cl 17Jan2003", + "Co": "PAW_PBE Co 06Sep2000", + "C": "PAW_PBE C 08Apr2002", + "Cr": "PAW_PBE Cr_pv 07Sep2000", + "Cs": "PAW_PBE Cs_sv 08Apr2002", + "Cu": "PAW_PBE Cu_pv 06Sep2000", + "Dy": "PAW_PBE Dy_3 06Sep2000", + "Er": "PAW_PBE Er_3 06Sep2000", + "Eu": "PAW_PBE Eu 08Apr2002", + "Fe": "PAW_PBE Fe_pv 06Sep2000", + "F": "PAW_PBE F 08Apr2002", + "Ga": "PAW_PBE Ga_d 06Sep2000", + "Gd": "PAW_PBE Gd 08Apr2002", + "Ge": "PAW_PBE Ge_d 06Sep2000", + "He": "PAW_PBE He 05Jan2001", + "Hf": "PAW_PBE Hf_pv 06Sep2000", + "Hg": "PAW_PBE Hg 06Sep2000", + "Ho": "PAW_PBE Ho_3 06Sep2000", + "H": "PAW_PBE H 15Jun2001", + "In": "PAW_PBE In_d 06Sep2000", + "I": "PAW_PBE I 08Apr2002", + "Ir": "PAW_PBE Ir 06Sep2000", + "K": "PAW_PBE K_sv 06Sep2000", + "Kr": "PAW_PBE Kr 07Sep2000", + "La": "PAW_PBE La 06Sep2000", + "Li": "PAW_PBE Li_sv 23Jan2001", + "Lu": "PAW_PBE Lu_3 06Sep2000", + "Mg": "PAW_PBE Mg_pv 06Sep2000", + "Mn": "PAW_PBE Mn_pv 07Sep2000", + "Mo": "PAW_PBE Mo_pv 08Apr2002", + "Na": "PAW_PBE Na_pv 05Jan2001", + "Nb": "PAW_PBE Nb_pv 08Apr2002", + "Nd": "PAW_PBE Nd_3 06Sep2000", + "Ne": "PAW_PBE Ne 05Jan2001", + "Ni": "PAW_PBE Ni_pv 06Sep2000", + "N": "PAW_PBE N 08Apr2002", + "Np": "PAW_PBE Np 06Sep2000", + "O": "PAW_PBE O 08Apr2002", + "Os": "PAW_PBE Os_pv 20Jan2003", + "Pa": "PAW_PBE Pa 07Sep2000", + "Pb": "PAW_PBE Pb_d 06Sep2000", + "Pd": "PAW_PBE Pd 05Jan2001", + "Pm": "PAW_PBE Pm_3 07Sep2000", + "P": "PAW_PBE P 17Jan2003", + "Pr": "PAW_PBE Pr_3 07Sep2000", + "Pt": "PAW_PBE Pt 05Jan2001", + "Pu": "PAW_PBE Pu 06Sep2000", + "Rb": "PAW_PBE Rb_sv 06Sep2000", + "Re": "PAW_PBE Re_pv 06Sep2000", + "Rh": "PAW_PBE Rh_pv 06Sep2000", + "Ru": "PAW_PBE Ru_pv 06Sep2000", + "Sb": "PAW_PBE Sb 06Sep2000", + "Sc": "PAW_PBE Sc_sv 07Sep2000", + "Se": "PAW_PBE Se 06Sep2000", + "Si": "PAW_PBE Si 05Jan2001", + "Sm": "PAW_PBE Sm_3 07Sep2000", + "Sn": "PAW_PBE Sn_d 06Sep2000", + "S": "PAW_PBE S 17Jan2003", + "Sr": "PAW_PBE Sr_sv 07Sep2000", + "Ta": "PAW_PBE Ta_pv 07Sep2000", + "Tb": "PAW_PBE Tb_3 06Sep2000", + "Tc": "PAW_PBE Tc_pv 06Sep2000", + "Te": "PAW_PBE Te 08Apr2002", + "Th": "PAW_PBE Th 07Sep2000", + "Ti": "PAW_PBE Ti_pv 07Sep2000", + "Tl": "PAW_PBE Tl_d 06Sep2000", + "Tm": "PAW_PBE Tm_3 20Jan2003", + "U": "PAW_PBE U 06Sep2000", + "V": "PAW_PBE V_sv 07Sep2000", + "W": "PAW_PBE W_pv 06Sep2000", + "Xe": "PAW_PBE Xe 07Sep2000", + "Y": "PAW_PBE Y_sv 06Sep2000", + "Zn": "PAW_PBE Zn 06Sep2000", + "Zr": "PAW_PBE Zr_sv 07Sep2000" +} diff --git a/src/lematerial_fetcher/push.py b/src/lematerial_fetcher/push.py index 8edb131..c8f01cc 100644 --- a/src/lematerial_fetcher/push.py +++ b/src/lematerial_fetcher/push.py @@ -56,6 +56,11 @@ def __init__( ): self.config = config self.data_type = data_type + self.table_names = ( + [self.config.source_table_name] + if isinstance(self.config.source_table_name, str) + else self.config.source_table_name + ) assert self.data_type in ["optimade", "trajectories", "any"], ( f"Invalid data type: {self.data_type}, " @@ -76,7 +81,7 @@ def __init__( self.max_rows = self.config.max_rows if self.config.data_dir is None: - self.data_dir = get_cache_dir() / f"push/{self.config.source_table_name}" + self.data_dir = get_cache_dir() / f"push/{'_'.join(self.table_names)}" else: self.data_dir = Path(self.config.data_dir) self.data_dir.mkdir(parents=True, exist_ok=True) @@ -120,13 +125,16 @@ def _get_optimade_features(self) -> Features: "elements_ratios": Sequence(Value("float64")), "stress_tensor": Sequence(Sequence(Value("float64"))), "energy": Value("float64"), + "energy_corrected": Value("float64"), "magnetic_moments": Sequence(Value("float64")), "forces": Sequence(Sequence(Value("float64"))), "total_magnetization": Value("float64"), + "charges": Sequence(Value("float64")), "dos_ef": Value("float64"), "functional": Value("string"), "cross_compatibility": Value("bool"), - # "entalpic_fingerprint": Value("string"), # TODO(Ramlaoui): Add this back in later + "bawl_fingerprint": Value("string"), + "space_group_it_number": Value("int32"), } ) @@ -156,10 +164,13 @@ def _get_trajectories_features(self) -> Features: "relaxation_number": (Value("int32")), } ) - # We do not have magnetic moments, total magnetization, and dos_ef in trajectories + # We do not have magnetic moments, total magnetization, and dos_ef, space_group_it_number, + # and bawl_fingerprint in trajectories del features["magnetic_moments"] del features["dos_ef"] + del features["charges"] del features["total_magnetization"] + del features["bawl_fingerprint"] convert_features_dict.update( { @@ -191,6 +202,7 @@ def push(self) -> dict[str, Dataset]: # Cross compatible entries: for functional in Functional: + functional = Functional.PBESOL limit_query = ( f"WHERE functional = '{functional.value}' AND cross_compatibility = 't'" ) @@ -267,93 +279,98 @@ def download_db_as_csv(self, limit_query: str, data_dir: Path) -> Dataset | None conn = psycopg2.connect(self.conn_str) try: # Check if the table is empty - with conn.cursor(name="server_cursor") as cur: - query = f"SELECT EXISTS(SELECT 1 FROM {self.config.source_table_name} {limit_query} LIMIT 1);" - cur.execute(query) - has_rows = cur.fetchone()[0] - - if not has_rows: - return None - - # Get all the ids in the table to have faster queries later - with conn.cursor(name="server_cursor") as cur: - query = f"SELECT id FROM {self.config.source_table_name} {limit_query}" + for table_name in self.table_names: + logger.info(f"Processing table: {table_name}") + + with conn.cursor(name="server_cursor") as cur: + query = f"SELECT EXISTS(SELECT 1 FROM {table_name} {limit_query} LIMIT 1);" + cur.execute(query) + has_rows = cur.fetchone()[0] + + if not has_rows: + return None + + # Get all the ids in the table to have faster queries later + with conn.cursor(name="server_cursor") as cur: + query = f"SELECT id FROM {table_name} {limit_query}" + if self.max_rows is not None and self.max_rows != -1: + query += f" LIMIT {self.max_rows};" + else: + query += ";" + cur.execute(query) + ids = [row[0] for row in cur.fetchall()] + + total_rows = len(ids) + logger.info(f"Total rows: {total_rows}") + + # Apply max_rows limit if specified if self.max_rows is not None and self.max_rows != -1: - query += f" LIMIT {self.max_rows};" - else: - query += ";" - cur.execute(query) - ids = [row[0] for row in cur.fetchall()] - - total_rows = len(ids) - logger.info(f"Total rows: {total_rows}") + total_rows = min(self.max_rows, total_rows) - # Apply max_rows limit if specified - if self.max_rows is not None and self.max_rows != -1: - total_rows = min(self.max_rows, total_rows) + chunk_size = min(self.config.chunk_size, total_rows) + num_chunks = (total_rows + chunk_size - 1) // chunk_size - chunk_size = min(self.config.chunk_size, total_rows) - num_chunks = (total_rows + chunk_size - 1) // chunk_size - - # Will copy all columns if data_type is "any" - if self.columns is None: - columns = "*" - else: - columns = ", ".join(self.columns) - - ids_at_offset = [ids[i * chunk_size] for i in range(num_chunks)] - del ids - - # Process chunks in parallel if not in debug mode - if self.debug: - for i in range(num_chunks): - self.process_chunk( - chunk_index=i, - id_at_offset=ids_at_offset[i], - chunk_size=chunk_size, - num_chunks=num_chunks, - data_dir=data_dir, - conn_str=self.conn_str, - config=self.config, - limit_query=limit_query, - columns=columns, - ) - else: - chunk_tasks = [ - ( - i, - ids_at_offset[i], - chunk_size, - num_chunks, - data_dir, - self.conn_str, - self.config, - limit_query, - columns, - ) - for i in range(num_chunks) - ] - - with ProcessPoolExecutor( - max_workers=self.config.num_workers - ) as executor: - futures = { - executor.submit(self.process_chunk, *task): task - for task in chunk_tasks - } - - # Process results as they complete - for future in futures: - try: - result = future.result() - if not result: - logger.warning( - f"Failed to process chunk {futures[future][0]}" + # Will copy all columns if data_type is "any" + if self.columns is None: + columns = "*" + else: + columns = ", ".join(self.columns) + + ids_at_offset = [ids[i * chunk_size] for i in range(num_chunks)] + del ids + + # Process chunks in parallel if not in debug mode + if self.debug: + for i in range(num_chunks): + self.process_chunk( + chunk_index=i, + id_at_offset=ids_at_offset[i], + chunk_size=chunk_size, + num_chunks=num_chunks, + data_dir=data_dir, + conn_str=self.conn_str, + config=self.config, + limit_query=limit_query, + columns=columns, + table_name=table_name, + ) + else: + chunk_tasks = [ + ( + i, + ids_at_offset[i], + chunk_size, + num_chunks, + data_dir, + self.conn_str, + self.config, + limit_query, + columns, + table_name, + ) + for i in range(num_chunks) + ] + + with ProcessPoolExecutor( + max_workers=self.config.num_workers + ) as executor: + futures = { + executor.submit(self.process_chunk, *task): task + for task in chunk_tasks + } + + # Process results as they complete + for future in futures: + try: + result = future.result() + if not result: + logger.warning( + f"Failed to process chunk {futures[future][0]}" + ) + except Exception as e: + logger.error( + f"Error processing chunk {futures[future][0]}: {str(e)}" ) - except Exception as e: - logger.error( - f"Error processing chunk {futures[future][0]}: {str(e)}" - ) finally: conn.close() @@ -375,8 +392,9 @@ def process_chunk( config, limit_query, columns, + table_name, ): - chunk_file = data_dir / f"chunk_{chunk_index}.jsonl" + chunk_file = data_dir / f"chunk_{chunk_index}_{table_name}.jsonl" # Skip if file already exists if chunk_file.exists(): @@ -394,7 +412,7 @@ def process_chunk( SELECT row_to_json(t) FROM ( SELECT {columns} - FROM {config.source_table_name} + FROM {table_name} {limit_query} """ diff --git a/src/lematerial_fetcher/utils/cli.py b/src/lematerial_fetcher/utils/cli.py index 5fdfdd6..20c2f15 100644 --- a/src/lematerial_fetcher/utils/cli.py +++ b/src/lematerial_fetcher/utils/cli.py @@ -257,6 +257,7 @@ def add_push_options(f): "--table-name", type=str, envvar="LEMATERIALFETCHER_TABLE_NAME", + multiple=True, help="Table name to push data from.", ), click.option( diff --git a/src/lematerial_fetcher/utils/config.py b/src/lematerial_fetcher/utils/config.py index 7497035..f7c5746 100644 --- a/src/lematerial_fetcher/utils/config.py +++ b/src/lematerial_fetcher/utils/config.py @@ -48,7 +48,7 @@ class TransformerConfig(BaseConfig): @dataclass class PushConfig(BaseConfig): source_db_conn_str: str - source_table_name: str + source_table_name: str | list[str] hf_repo_id: str hf_token: str | None = None data_dir: str | None = None @@ -345,7 +345,7 @@ def load_push_config( db_user: Optional[str] = None, db_host: str = "localhost", db_name: Optional[str] = None, - table_name: Optional[str] = None, + table_name: Optional[str | list[str]] = None, hf_repo_id: Optional[str] = None, hf_token: Optional[str] = None, data_dir: Optional[str] = None, diff --git a/src/lematerial_fetcher/utils/logging.py b/src/lematerial_fetcher/utils/logging.py index 4405f48..ea5f6f2 100644 --- a/src/lematerial_fetcher/utils/logging.py +++ b/src/lematerial_fetcher/utils/logging.py @@ -88,4 +88,6 @@ def fatal(self, message: str, *args, **kwargs): self.term_logger.fatal(message, stacklevel=2, *args, **kwargs) -logger = Logger() +logger = Logger( + level="DEBUG" if os.environ.get("LEMATERIALFETCHER_DEBUG", None) else "INFO" +) diff --git a/src/lematerial_fetcher/utils/structure.py b/src/lematerial_fetcher/utils/structure.py index 8fc0e24..d6fd13d 100644 --- a/src/lematerial_fetcher/utils/structure.py +++ b/src/lematerial_fetcher/utils/structure.py @@ -1,5 +1,5 @@ import numpy as np -from pymatgen.core import Structure +from pymatgen.core import Composition, Structure def get_element_ratios_from_composition_reduced( @@ -13,6 +13,29 @@ def get_element_ratios_from_composition_reduced( return element_ratios +def get_composition_reduced_from_reduced_dict(reduced_dict: dict[str, float]) -> dict: + """ + Extracts the composition from a reduced dictionary. + """ + items_reduced = [ + f"{element}{int(reduced_dict[element])}" + if int(reduced_dict[element]) > 1 + else element + for element in sorted(list(reduced_dict.keys())) # alphabetical order + ] + chemical_formula_reduced = "".join(items_reduced) + return chemical_formula_reduced + + +def get_composition_reduced_from_descriptive_formula(batch): + for i in range(len(batch["chemical_formula_descriptive"])): + composition = Composition(batch["chemical_formula_descriptive"][i]) + batch["chemical_formula_reduced"][i] = ( + get_composition_reduced_from_reduced_dict(composition.to_reduced_dict) + ) + return batch + + def get_optimade_from_pymatgen(structure: Structure) -> dict: """ Extracts the possible fields from a pymatgen Structure object @@ -36,13 +59,10 @@ def get_optimade_from_pymatgen(structure: Structure) -> dict: elements_ratios = get_element_ratios_from_composition_reduced(reduced_dict) # Formula fields - chemical_formula_reduced = "".join( - f"{element}{int(ratio)}" if int(ratio) > 1 else element - for element, ratio in zip(elements, elements_ratios) - ) + chemical_formula_reduced = get_composition_reduced_from_reduced_dict(reduced_dict) chemical_formula_anonymous = structure.composition.anonymized_formula # TODO(Ramlaoui): Maybe we should use the factor here? - chemical_formula_descriptive = str(structure.composition) + chemical_formula_descriptive = structure.composition.formula # Site and position data cartesian_site_positions = structure.cart_coords.tolist() @@ -50,14 +70,14 @@ def get_optimade_from_pymatgen(structure: Structure) -> dict: species = [ { "mass": None, - "name": str(site.specie), + "name": element, "attached": None, "nattached": None, "concentration": [1], "original_name": None, - "chemical_symbols": [str(site.specie)], + "chemical_symbols": [element], } - for site in structure.sites + for element in elements ] # Structure metadata diff --git a/tests/models/test_optimade_model.py b/tests/models/test_optimade_model.py index 67471d3..39ecf53 100644 --- a/tests/models/test_optimade_model.py +++ b/tests/models/test_optimade_model.py @@ -8,7 +8,7 @@ # Test data for a valid structure VALID_STRUCTURE_DATA = { "id": "test_id", - "source": "test_source", + "source": "oqmd", "elements": ["Al", "O"], # Alphabetically ordered "nelements": 2, "elements_ratios": [0.4, 0.6], # Sum to 1.0 @@ -44,7 +44,7 @@ def test_optional_fields(): "stress_tensor": [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], "energy": -100.0, "magnetic_moments": [0.0, 1.0], - "forces": [[0.0, 0.0, 0.0], [0.1, 0.1, 0.1]], + "forces": [[0.0, 0.0, 0.0], [0.01, 0.01, 0.01]], "total_magnetization": 1.0, "dos_ef": 0.5, "functional": Functional.PBE, @@ -90,7 +90,7 @@ def test_invalid_forces(): """Test validation of forces dimensions.""" data = VALID_STRUCTURE_DATA.copy() data["forces"] = [[1.0, 0.0], [0.0, 1.0]] # Not 3D vectors - with pytest.raises(ValueError, match="Vector must have exactly 3 components"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -98,7 +98,7 @@ def test_invalid_positions(): """Test validation of cartesian positions dimensions.""" data = VALID_STRUCTURE_DATA.copy() data["cartesian_site_positions"] = [[1.0, 0.0], [0.0, 1.0]] # Not 3D vectors - with pytest.raises(ValueError, match="Vector must have exactly 3 components"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -106,7 +106,7 @@ def test_inconsistent_site_counts(): """Test validation of site count consistency.""" data = VALID_STRUCTURE_DATA.copy() data["nsites"] = 3 # Doesn't match length of positions - with pytest.raises(ValueError, match="List must have exactly 3 items"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -114,26 +114,26 @@ def test_invalid_date_format(): """Test validation of last_modified date format.""" data = VALID_STRUCTURE_DATA.copy() data["last_modified"] = "2024-13-13" # Invalid format - with pytest.raises(ValueError, match="Input should be a valid datetime"): + with pytest.raises(ValueError): OptimadeStructure(**data) def test_empty_required_fields(): """Test validation of empty required fields.""" required_fields = [ - ("elements", [], "List should have at least 1 item after validation"), - ("source", "", "String should have at least 1 character"), - ("id", "", "String should have at least 1 character"), - ("chemical_formula_anonymous", "", "String should have at least 1 character"), - ("chemical_formula_descriptive", "", "String should have at least 1 character"), - ("chemical_formula_reduced", "", "String should have at least 1 character"), - ("immutable_id", "", "String should have at least 1 character"), + ("elements", []), + ("source", ""), + ("id", ""), + ("chemical_formula_anonymous", ""), + ("chemical_formula_descriptive", ""), + ("chemical_formula_reduced", ""), + ("immutable_id", ""), ] - for field, empty_value, error_msg in required_fields: + for field, empty_value in required_fields: data = VALID_STRUCTURE_DATA.copy() data[field] = empty_value - with pytest.raises(ValueError, match=error_msg): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -143,16 +143,12 @@ def test_invalid_dimension_types(): # Test too many dimensions data["dimension_types"] = [1, 1, 1, 1] - with pytest.raises( - ValueError, match="List should have at most 3 items after validation" - ): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test empty dimensions data["dimension_types"] = [] - with pytest.raises( - ValueError, match="List should have at least 1 item after validation" - ): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -162,12 +158,12 @@ def test_invalid_nperiodic_dimensions(): # Test negative value data["nperiodic_dimensions"] = -1 - with pytest.raises(ValueError, match="Input should be greater than or equal to 0"): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test too many dimensions data["nperiodic_dimensions"] = 4 - with pytest.raises(ValueError, match="Input should be less than or equal to 3"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -177,12 +173,12 @@ def test_invalid_lattice_vectors(): # Test wrong number of vectors data["lattice_vectors"] = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]] - with pytest.raises(ValueError, match="Matrix must be a 3x3 matrix"): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test wrong vector dimensions data["lattice_vectors"] = [[1.0, 0.0], [0.0, 1.0], [0.0, 0.0]] - with pytest.raises(ValueError, match="Matrix must be a 3x3 matrix"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -202,22 +198,20 @@ def test_cross_field_validation(): # Test elements/ratios length mismatch data["elements_ratios"] = [0.4, 0.3, 0.3] - with pytest.raises( - ValueError, - match=r"Number of elements \(\d+\) must match number of element ratios \(\d+\)", - ): + with pytest.raises(ValueError): + (ValueError,) OptimadeStructure(**data) # Test species_at_sites length mismatch data = VALID_STRUCTURE_DATA.copy() data["species_at_sites"] = ["Al"] - with pytest.raises(ValueError, match="List must have exactly 2 items"): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test magnetic_moments length mismatch data = VALID_STRUCTURE_DATA.copy() data["magnetic_moments"] = [1.0] # Should be length 2 to match nsites - with pytest.raises(ValueError, match="List must have exactly 2 items"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -227,21 +221,19 @@ def test_optional_field_validation(): # Test invalid stress tensor format data["stress_tensor"] = [[1.0, 0.0], [0.0, 1.0], [0.0, 0.0, 1.0]] - with pytest.raises(ValueError, match="Matrix must be a 3x3 matrix"): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test invalid forces format data = VALID_STRUCTURE_DATA.copy() data["forces"] = [[1.0, 0.0], [0.0, 1.0, 0.0]] # Inconsistent dimensions - with pytest.raises( - ValueError, match="Invalid vector format: Vector must have exactly 3 components" - ): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test invalid magnetic moments (not matching nsites) data = VALID_STRUCTURE_DATA.copy() data["magnetic_moments"] = [1.0, 2.0, 3.0] # Too many values for nsites=2 - with pytest.raises(ValueError, match="List must have exactly 2 items"): + with pytest.raises(ValueError): OptimadeStructure(**data) @@ -249,7 +241,7 @@ def test_functional_enum(): """Test validation of functional enum values.""" data = VALID_STRUCTURE_DATA.copy() data["functional"] = "INVALID" # Invalid functional - with pytest.raises(ValueError, match=r"Input should be 'pbe', 'pbesol' or 'scan'"): + with pytest.raises(ValueError): OptimadeStructure(**data) # Test valid functionals diff --git a/uv.lock b/uv.lock index d267db6..64868fc 100644 --- a/uv.lock +++ b/uv.lock @@ -112,6 +112,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643 }, ] +[[package]] +name = "ase" +version = "3.24.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "matplotlib" }, + { name = "numpy" }, + { name = "scipy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/5c/c9/9adb9bc641bd7222367886e4e6c753b4c64da4ff2d9565ab39aee1e34734/ase-3.24.0.tar.gz", hash = "sha256:9acc93d6daaf48cd27b844c56f8bf49428b9db0542faa3cc30d9d5b8e1842195", size = 2383264 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/1f/cd/b1253035a1da90e89f31947e052c558cd83df3bcaff34aa199e5e806d773/ase-3.24.0-py3-none-any.whl", hash = "sha256:974922df87ef4ec8cf1140359a55ab4c4dc55c38e26876bdd9c00968da1f463c", size = 2928893 }, +] + [[package]] name = "astroid" version = "3.3.9" @@ -139,6 +153,25 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/77/06/bb80f5f86020c4551da315d78b3ab75e8228f89f0162f2c3a819e407941a/attrs-25.3.0-py3-none-any.whl", hash = "sha256:427318ce031701fea540783410126f03899a97ffc6f61596ad581ac2e40e3bc3", size = 63815 }, ] +[[package]] +name = "average-minimum-distance" +version = "1.5.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "gemmi" }, + { name = "joblib" }, + { name = "numba" }, + { name = "numpy" }, + { name = "pandas" }, + { name = "scikit-learn" }, + { name = "scipy" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/27/cf/7c802c5c29ce55f08c0ac3fe4475bdd185d22b84ace674498cbf3f3f3272/average-minimum-distance-1.5.3.tar.gz", hash = "sha256:2867a392b5cf845a068fa8d1a73cc864af6706cb3c22ccf54e031832d90e81e2", size = 3507093 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/cb/5b/47e411507cac3f67be1e22bbcf7996041b5fa06e3e8775a524458c2b17d9/average_minimum_distance-1.5.3-py3-none-any.whl", hash = "sha256:dd7060b7a02dc9b6838d7e4a84a316ad97688cf9e50857cea0fa528ebcd3284f", size = 102505 }, +] + [[package]] name = "babel" version = "2.17.0" @@ -516,6 +549,29 @@ http = [ { name = "aiohttp" }, ] +[[package]] +name = "gemmi" +version = "0.7.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/f9/71/31b3706e939501daf06c87fe8a13d4c223d6c3f8bbe9889374047d5ea176/gemmi-0.7.1.tar.gz", hash = "sha256:73bb4a2c574ef7586efdf0161aae22bb75c0301af5e9cc22252877e707facdd2", size = 1355484 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b8/82/b507d5a0084db70951e970ca4eadda035ff16913e61bff1d334889e16086/gemmi-0.7.1-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:116d1f84eb3fe2e8b80a3d0736f10b9d946d207a79b8998211ef3207037f54aa", size = 2638416 }, + { url = "https://files.pythonhosted.org/packages/1b/0b/80683539832fa5048cb75336228fc2f19f8b4977e9e820277abc8babafeb/gemmi-0.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:3f8924cdf61b3467441289cb65a6af3d7752143ef8f0a350c055746d1bfca2d7", size = 2280279 }, + { url = "https://files.pythonhosted.org/packages/fe/95/420a6b84d0e5306b366bd6d2e246f5dd33018bc7cad194ff187845245a82/gemmi-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2cbabee2e1a9f5bcfc249e8383ba551f84e1cfd25ca1af5109bb8d8c867be9c3", size = 2549411 }, + { url = "https://files.pythonhosted.org/packages/97/74/ffa359c7093ac5a455fb4670f9aebd2329cbb7530226a08b4e9335547a55/gemmi-0.7.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:36b32f8b3b98ab6aa0b92a3d6d944bbb2ee2191f3607b2df966ad4799bddadf3", size = 3062714 }, + { url = "https://files.pythonhosted.org/packages/bd/15/ed7d491c2e1c8c13ed3c800b71ef267c0d50bacf9091773edce911392e17/gemmi-0.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:38fd01bf1e9373fbb18aa64a12ccec320204c4ffb2d37b0f650be55b5f674495", size = 1926637 }, + { url = "https://files.pythonhosted.org/packages/de/32/43020472d5a5ef2f57d96fd33e44d2c44129896a9cfd8e1dca8c15898a38/gemmi-0.7.1-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:3f4421a5e38ef3f1474b466f888c1517d813b99935aafe52f1da22053b1eb827", size = 2658538 }, + { url = "https://files.pythonhosted.org/packages/e2/a0/bc5b1719a7cb0c30edff6fe5da7d1acaa543e9bd578dc654eafce169a72c/gemmi-0.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:0388a02758c4d518d6be2a7313cf81b1292978658e24b655f1112ed3764826ff", size = 2283568 }, + { url = "https://files.pythonhosted.org/packages/9a/d1/035c45c28f0b17d14600eeb04f27667d233d050bb01c0f8ec316d0773f4a/gemmi-0.7.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a5935c5b5c510c223f9afad9261c268118d6dd63511f9dc8707e50b9ca771e78", size = 2528962 }, + { url = "https://files.pythonhosted.org/packages/4f/09/adea72793f2276ad654c7e98d10e1e4076517390d4c99b4f3080a1722d21/gemmi-0.7.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:08080973b422602f6bd983ec2ff909601a283d358175bd84cbb8bf43d9eeeb4f", size = 3054102 }, + { url = "https://files.pythonhosted.org/packages/d3/8f/bcd00bd14e58a8f9bac3ed0794221ba234f0d0dae4aa5ed470faafc1d9ac/gemmi-0.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:9aee1a50248c259c44aff20c3d1b3a246b00536279e22f24389e45674f9de5b3", size = 1928456 }, + { url = "https://files.pythonhosted.org/packages/80/46/7bb321130abd77c4ac3e2a885d47e9230c227e82d902d4c5ea6e89202503/gemmi-0.7.1-cp313-cp313-macosx_10_14_x86_64.whl", hash = "sha256:60afbd55b0f9909684f71e22915a3be6985bd6125d9056acc3531d3b83c6421a", size = 2658567 }, + { url = "https://files.pythonhosted.org/packages/5c/5b/1d6842cd88f2a37ec31dcceb2475d302b78bd61bc01be1c8188f05a07cb8/gemmi-0.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:100a150da8f47db8e0c329ca87e5b4479292b33c6aee3529cd5a8451321a624f", size = 2283606 }, + { url = "https://files.pythonhosted.org/packages/7b/aa/e333d42318c9668e1c3c5571ce6007d29ed3da1814d849e0ac35c4be3edc/gemmi-0.7.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e0a9358fec36cad3f9f55e5ca927e378bd48ded30522ae257153787b699ac303", size = 2527544 }, + { url = "https://files.pythonhosted.org/packages/3b/39/c60a140a2b52eb1efce62486aef47090fe54c603891b47037af61f5ae316/gemmi-0.7.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:658ce0578eb966530f3733738130120e7305a0fa421349089279a0164ac24e23", size = 3054136 }, + { url = "https://files.pythonhosted.org/packages/3a/5d/b645a1e7c71ba562cf31987ee7499f603b6b49f67ccab521b3b600f53a1e/gemmi-0.7.1-cp313-cp313-win_amd64.whl", hash = "sha256:402a71c935cab167ac6a7a29045e47a972388ef6f62fa3f477d8b0241fe53d4e", size = 1928436 }, +] + [[package]] name = "huggingface-hub" version = "0.29.3" @@ -769,11 +825,14 @@ name = "lematerial-fetcher" version = "0.1.0" source = { editable = "." } dependencies = [ + { name = "ase" }, { name = "beautifulsoup4" }, { name = "boto3" }, { name = "click" }, { name = "datasets" }, { name = "ijson" }, + { name = "material-hasher" }, + { name = "moyopy" }, { name = "mysql-connector-python" }, { name = "numpy" }, { name = "psycopg2-binary" }, @@ -808,11 +867,14 @@ dev = [ [package.metadata] requires-dist = [ + { name = "ase", specifier = ">=3.24.0" }, { name = "beautifulsoup4", specifier = ">=4.13.3" }, { name = "boto3", specifier = ">=1.36.20" }, { name = "click", specifier = ">=8.1.8" }, { name = "datasets", specifier = ">=3.4.1" }, { name = "ijson", specifier = ">=3.3.0" }, + { name = "material-hasher", git = "https://github.com/LeMaterial/lematerial-hasher.git" }, + { name = "moyopy", specifier = ">=0.4.2" }, { name = "mysql-connector-python", specifier = ">=9.2.0" }, { name = "numpy", specifier = ">=2.1.2" }, { name = "psycopg2-binary", specifier = ">=2.9.10" }, @@ -845,6 +907,42 @@ dev = [ { name = "sphinxawesome-theme", specifier = ">=5.3.2" }, ] +[[package]] +name = "llvmlite" +version = "0.44.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/89/6a/95a3d3610d5c75293d5dbbb2a76480d5d4eeba641557b69fe90af6c5b84e/llvmlite-0.44.0.tar.gz", hash = "sha256:07667d66a5d150abed9157ab6c0b9393c9356f229784a4385c02f99e94fc94d4", size = 171880 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b5/e2/86b245397052386595ad726f9742e5223d7aea999b18c518a50e96c3aca4/llvmlite-0.44.0-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:eed7d5f29136bda63b6d7804c279e2b72e08c952b7c5df61f45db408e0ee52f3", size = 28132305 }, + { url = "https://files.pythonhosted.org/packages/ff/ec/506902dc6870249fbe2466d9cf66d531265d0f3a1157213c8f986250c033/llvmlite-0.44.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:ace564d9fa44bb91eb6e6d8e7754977783c68e90a471ea7ce913bff30bd62427", size = 26201090 }, + { url = "https://files.pythonhosted.org/packages/99/fe/d030f1849ebb1f394bb3f7adad5e729b634fb100515594aca25c354ffc62/llvmlite-0.44.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c5d22c3bfc842668168a786af4205ec8e3ad29fb1bc03fd11fd48460d0df64c1", size = 42361858 }, + { url = "https://files.pythonhosted.org/packages/d7/7a/ce6174664b9077fc673d172e4c888cb0b128e707e306bc33fff8c2035f0d/llvmlite-0.44.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f01a394e9c9b7b1d4e63c327b096d10f6f0ed149ef53d38a09b3749dcf8c9610", size = 41184200 }, + { url = "https://files.pythonhosted.org/packages/5f/c6/258801143975a6d09a373f2641237992496e15567b907a4d401839d671b8/llvmlite-0.44.0-cp311-cp311-win_amd64.whl", hash = "sha256:d8489634d43c20cd0ad71330dde1d5bc7b9966937a263ff1ec1cebb90dc50955", size = 30331193 }, + { url = "https://files.pythonhosted.org/packages/15/86/e3c3195b92e6e492458f16d233e58a1a812aa2bfbef9bdd0fbafcec85c60/llvmlite-0.44.0-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:1d671a56acf725bf1b531d5ef76b86660a5ab8ef19bb6a46064a705c6ca80aad", size = 28132297 }, + { url = "https://files.pythonhosted.org/packages/d6/53/373b6b8be67b9221d12b24125fd0ec56b1078b660eeae266ec388a6ac9a0/llvmlite-0.44.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:5f79a728e0435493611c9f405168682bb75ffd1fbe6fc360733b850c80a026db", size = 26201105 }, + { url = "https://files.pythonhosted.org/packages/cb/da/8341fd3056419441286c8e26bf436923021005ece0bff5f41906476ae514/llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c0143a5ef336da14deaa8ec26c5449ad5b6a2b564df82fcef4be040b9cacfea9", size = 42361901 }, + { url = "https://files.pythonhosted.org/packages/53/ad/d79349dc07b8a395a99153d7ce8b01d6fcdc9f8231355a5df55ded649b61/llvmlite-0.44.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d752f89e31b66db6f8da06df8b39f9b91e78c5feea1bf9e8c1fba1d1c24c065d", size = 41184247 }, + { url = "https://files.pythonhosted.org/packages/e2/3b/a9a17366af80127bd09decbe2a54d8974b6d8b274b39bf47fbaedeec6307/llvmlite-0.44.0-cp312-cp312-win_amd64.whl", hash = "sha256:eae7e2d4ca8f88f89d315b48c6b741dcb925d6a1042da694aa16ab3dd4cbd3a1", size = 30332380 }, + { url = "https://files.pythonhosted.org/packages/89/24/4c0ca705a717514c2092b18476e7a12c74d34d875e05e4d742618ebbf449/llvmlite-0.44.0-cp313-cp313-macosx_10_14_x86_64.whl", hash = "sha256:319bddd44e5f71ae2689859b7203080716448a3cd1128fb144fe5c055219d516", size = 28132306 }, + { url = "https://files.pythonhosted.org/packages/01/cf/1dd5a60ba6aee7122ab9243fd614abcf22f36b0437cbbe1ccf1e3391461c/llvmlite-0.44.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:9c58867118bad04a0bb22a2e0068c693719658105e40009ffe95c7000fcde88e", size = 26201090 }, + { url = "https://files.pythonhosted.org/packages/d2/1b/656f5a357de7135a3777bd735cc7c9b8f23b4d37465505bd0eaf4be9befe/llvmlite-0.44.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:46224058b13c96af1365290bdfebe9a6264ae62fb79b2b55693deed11657a8bf", size = 42361904 }, + { url = "https://files.pythonhosted.org/packages/d8/e1/12c5f20cb9168fb3464a34310411d5ad86e4163c8ff2d14a2b57e5cc6bac/llvmlite-0.44.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:aa0097052c32bf721a4efc03bd109d335dfa57d9bffb3d4c24cc680711b8b4fc", size = 41184245 }, + { url = "https://files.pythonhosted.org/packages/d0/81/e66fc86539293282fd9cb7c9417438e897f369e79ffb62e1ae5e5154d4dd/llvmlite-0.44.0-cp313-cp313-win_amd64.whl", hash = "sha256:2fb7c4f2fb86cbae6dca3db9ab203eeea0e22d73b99bc2341cdf9de93612e930", size = 30331193 }, +] + +[[package]] +name = "loguru" +version = "0.7.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "win32-setctime", marker = "sys_platform == 'win32'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/3a/05/a1dae3dffd1116099471c643b8924f5aa6524411dc6c63fdae648c4f1aca/loguru-0.7.3.tar.gz", hash = "sha256:19480589e77d47b8d85b2c827ad95d49bf31b0dcde16593892eb51dd18706eb6", size = 63559 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0c/29/0348de65b8cc732daa3e33e67806420b2ae89bdce2b04af740289c5c6c8c/loguru-0.7.3-py3-none-any.whl", hash = "sha256:31a33c10c8e1e10422bfd431aeb5d351c7cf7fa671e3c4df004162264b28220c", size = 61595 }, +] + [[package]] name = "lxml" version = "5.3.1" @@ -964,6 +1062,21 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/4f/65/6079a46068dfceaeabb5dcad6d674f5f5c61a6fa5673746f42a9f4c233b3/MarkupSafe-3.0.2-cp313-cp313t-win_amd64.whl", hash = "sha256:e444a31f8db13eb18ada366ab3cf45fd4b31e4db1236a4448f68778c1d1a5a2f", size = 15739 }, ] +[[package]] +name = "material-hasher" +version = "0.1.0" +source = { git = "https://github.com/LeMaterial/lematerial-hasher.git#7f63534e7a22033ea02d694cc7fdf9f4e52450ee" } +dependencies = [ + { name = "average-minimum-distance" }, + { name = "datasets" }, + { name = "moyopy" }, + { name = "pip" }, + { name = "pymatgen" }, + { name = "setuptools" }, + { name = "structuregraph-helpers" }, + { name = "torch" }, +] + [[package]] name = "matplotlib" version = "3.10.1" @@ -1041,6 +1154,22 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/eb/df/b3a36544734be3ac0eacf11bcfb8609464dd07d8bad0dff6e46109c68002/monty-2025.3.3-py3-none-any.whl", hash = "sha256:5eadb6d748c007bc63c34eceb2d80faff18f3996121d261dbceeea22adc58775", size = 51925 }, ] +[[package]] +name = "moyopy" +version = "0.4.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/ed/1f/b058d84a3a9b6807fe679483f87b3959d8562a4c2b0da691b488d944213e/moyopy-0.4.2.tar.gz", hash = "sha256:e801c47bd353e3f7803e6fb13eaac6f716c0d92009e61c9f3fd7d2e8d1d34142", size = 172843 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c0/2c/69f96c9a1037d680e15b8ecc10819f3db2ee5d996f0ba571ac926b80a5f9/moyopy-0.4.2-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:b93529bcbacef8472befd032e10ede0a9bde9f0918c83907157ccafb06b67cee", size = 974982 }, + { url = "https://files.pythonhosted.org/packages/f6/06/4dfd1e5257e9dc4ee3486ddc5f912abafd55db91685bac8ef173a6c24cc3/moyopy-0.4.2-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:9f9360b25fbc05e79da57c92cbfbb4425a03b2a13b50ddb782d83c4aebdc5296", size = 943535 }, + { url = "https://files.pythonhosted.org/packages/e8/9c/2cd7457ee369321a7e4c976486d84efcd8570414de562effe3d53957dc93/moyopy-0.4.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8f2c9eb0aa660a433ffc274e70b0b55c72e65a8c0f454fff1109af402eec8f43", size = 1111824 }, + { url = "https://files.pythonhosted.org/packages/82/ca/70721bef85a21d5b3217fd0ce51312f70949cd9976bb1357a34c5717e6c0/moyopy-0.4.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e4473b962e747756d2acdcbd44498c6886021a0a5c301e84e1313f9d1282a681", size = 1119257 }, + { url = "https://files.pythonhosted.org/packages/f0/b4/2668eae81d3357d7753d7b3d05eea0ff7c38216b9a93db8303ebaf3e6c12/moyopy-0.4.2-cp39-abi3-win_amd64.whl", hash = "sha256:2c4f08e0b194845196eb106b79fe175fda69c86c4d8986e9c347ebbb31ea6980", size = 787309 }, +] + [[package]] name = "mpmath" version = "1.3.0" @@ -1185,6 +1314,33 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/d2/1d/1b658dbd2b9fa9c4c9f32accbfc0205d532c8c6194dc0f2a4c0428e7128a/nodeenv-1.9.1-py2.py3-none-any.whl", hash = "sha256:ba11c9782d29c27c70ffbdda2d7415098754709be8a7056d79a737cd901155c9", size = 22314 }, ] +[[package]] +name = "numba" +version = "0.61.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "llvmlite" }, + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/1c/a0/e21f57604304aa03ebb8e098429222722ad99176a4f979d34af1d1ee80da/numba-0.61.2.tar.gz", hash = "sha256:8750ee147940a6637b80ecf7f95062185ad8726c8c28a2295b8ec1160a196f7d", size = 2820615 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/3f/97/c99d1056aed767503c228f7099dc11c402906b42a4757fec2819329abb98/numba-0.61.2-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:efd3db391df53aaa5cfbee189b6c910a5b471488749fd6606c3f33fc984c2ae2", size = 2775825 }, + { url = "https://files.pythonhosted.org/packages/95/9e/63c549f37136e892f006260c3e2613d09d5120672378191f2dc387ba65a2/numba-0.61.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:49c980e4171948ffebf6b9a2520ea81feed113c1f4890747ba7f59e74be84b1b", size = 2778695 }, + { url = "https://files.pythonhosted.org/packages/97/c8/8740616c8436c86c1b9a62e72cb891177d2c34c2d24ddcde4c390371bf4c/numba-0.61.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3945615cd73c2c7eba2a85ccc9c1730c21cd3958bfcf5a44302abae0fb07bb60", size = 3829227 }, + { url = "https://files.pythonhosted.org/packages/fc/06/66e99ae06507c31d15ff3ecd1f108f2f59e18b6e08662cd5f8a5853fbd18/numba-0.61.2-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:bbfdf4eca202cebade0b7d43896978e146f39398909a42941c9303f82f403a18", size = 3523422 }, + { url = "https://files.pythonhosted.org/packages/0f/a4/2b309a6a9f6d4d8cfba583401c7c2f9ff887adb5d54d8e2e130274c0973f/numba-0.61.2-cp311-cp311-win_amd64.whl", hash = "sha256:76bcec9f46259cedf888041b9886e257ae101c6268261b19fda8cfbc52bec9d1", size = 2831505 }, + { url = "https://files.pythonhosted.org/packages/b4/a0/c6b7b9c615cfa3b98c4c63f4316e3f6b3bbe2387740277006551784218cd/numba-0.61.2-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:34fba9406078bac7ab052efbf0d13939426c753ad72946baaa5bf9ae0ebb8dd2", size = 2776626 }, + { url = "https://files.pythonhosted.org/packages/92/4a/fe4e3c2ecad72d88f5f8cd04e7f7cff49e718398a2fac02d2947480a00ca/numba-0.61.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4ddce10009bc097b080fc96876d14c051cc0c7679e99de3e0af59014dab7dfe8", size = 2779287 }, + { url = "https://files.pythonhosted.org/packages/9a/2d/e518df036feab381c23a624dac47f8445ac55686ec7f11083655eb707da3/numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b1bb509d01f23d70325d3a5a0e237cbc9544dd50e50588bc581ba860c213546", size = 3885928 }, + { url = "https://files.pythonhosted.org/packages/10/0f/23cced68ead67b75d77cfcca3df4991d1855c897ee0ff3fe25a56ed82108/numba-0.61.2-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:48a53a3de8f8793526cbe330f2a39fe9a6638efcbf11bd63f3d2f9757ae345cd", size = 3577115 }, + { url = "https://files.pythonhosted.org/packages/68/1d/ddb3e704c5a8fb90142bf9dc195c27db02a08a99f037395503bfbc1d14b3/numba-0.61.2-cp312-cp312-win_amd64.whl", hash = "sha256:97cf4f12c728cf77c9c1d7c23707e4d8fb4632b46275f8f3397de33e5877af18", size = 2831929 }, + { url = "https://files.pythonhosted.org/packages/0b/f3/0fe4c1b1f2569e8a18ad90c159298d862f96c3964392a20d74fc628aee44/numba-0.61.2-cp313-cp313-macosx_10_14_x86_64.whl", hash = "sha256:3a10a8fc9afac40b1eac55717cece1b8b1ac0b946f5065c89e00bde646b5b154", size = 2771785 }, + { url = "https://files.pythonhosted.org/packages/e9/71/91b277d712e46bd5059f8a5866862ed1116091a7cb03bd2704ba8ebe015f/numba-0.61.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:7d3bcada3c9afba3bed413fba45845f2fb9cd0d2b27dd58a1be90257e293d140", size = 2773289 }, + { url = "https://files.pythonhosted.org/packages/0d/e0/5ea04e7ad2c39288c0f0f9e8d47638ad70f28e275d092733b5817cf243c9/numba-0.61.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:bdbca73ad81fa196bd53dc12e3aaf1564ae036e0c125f237c7644fe64a4928ab", size = 3893918 }, + { url = "https://files.pythonhosted.org/packages/17/58/064f4dcb7d7e9412f16ecf80ed753f92297e39f399c905389688cf950b81/numba-0.61.2-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:5f154aaea625fb32cfbe3b80c5456d514d416fcdf79733dd69c0df3a11348e9e", size = 3584056 }, + { url = "https://files.pythonhosted.org/packages/af/a4/6d3a0f2d3989e62a18749e1e9913d5fa4910bbb3e3311a035baea6caf26d/numba-0.61.2-cp313-cp313-win_amd64.whl", hash = "sha256:59321215e2e0ac5fa928a8020ab00b8e57cda8a97384963ac0dfa4d4e6aa54e7", size = 2831846 }, +] + [[package]] name = "numpy" version = "2.2.4" @@ -1233,6 +1389,124 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/3e/05/eb7eec66b95cf697f08c754ef26c3549d03ebd682819f794cb039574a0a6/numpy-2.2.4-cp313-cp313t-win_amd64.whl", hash = "sha256:188dcbca89834cc2e14eb2f106c96d6d46f200fe0200310fc29089657379c58d", size = 12739119 }, ] +[[package]] +name = "nvidia-cublas-cu12" +version = "12.4.5.8" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ae/71/1c91302526c45ab494c23f61c7a84aa568b8c1f9d196efa5993957faf906/nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl", hash = "sha256:2fc8da60df463fdefa81e323eef2e36489e1c94335b5358bcb38360adf75ac9b", size = 363438805 }, +] + +[[package]] +name = "nvidia-cuda-cupti-cu12" +version = "12.4.127" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/67/42/f4f60238e8194a3106d06a058d494b18e006c10bb2b915655bd9f6ea4cb1/nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl", hash = "sha256:9dec60f5ac126f7bb551c055072b69d85392b13311fcc1bcda2202d172df30fb", size = 13813957 }, +] + +[[package]] +name = "nvidia-cuda-nvrtc-cu12" +version = "12.4.127" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2c/14/91ae57cd4db3f9ef7aa99f4019cfa8d54cb4caa7e00975df6467e9725a9f/nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl", hash = "sha256:a178759ebb095827bd30ef56598ec182b85547f1508941a3d560eb7ea1fbf338", size = 24640306 }, +] + +[[package]] +name = "nvidia-cuda-runtime-cu12" +version = "12.4.127" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ea/27/1795d86fe88ef397885f2e580ac37628ed058a92ed2c39dc8eac3adf0619/nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl", hash = "sha256:64403288fa2136ee8e467cdc9c9427e0434110899d07c779f25b5c068934faa5", size = 883737 }, +] + +[[package]] +name = "nvidia-cudnn-cu12" +version = "9.1.0.70" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-cublas-cu12", marker = "sys_platform != 'win32'" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/9f/fd/713452cd72343f682b1c7b9321e23829f00b842ceaedcda96e742ea0b0b3/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl", hash = "sha256:165764f44ef8c61fcdfdfdbe769d687e06374059fbb388b6c89ecb0e28793a6f", size = 664752741 }, +] + +[[package]] +name = "nvidia-cufft-cu12" +version = "11.2.1.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-nvjitlink-cu12", marker = "sys_platform != 'win32'" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/27/94/3266821f65b92b3138631e9c8e7fe1fb513804ac934485a8d05776e1dd43/nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl", hash = "sha256:f083fc24912aa410be21fa16d157fed2055dab1cc4b6934a0e03cba69eb242b9", size = 211459117 }, +] + +[[package]] +name = "nvidia-curand-cu12" +version = "10.3.5.147" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/8a/6d/44ad094874c6f1b9c654f8ed939590bdc408349f137f9b98a3a23ccec411/nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl", hash = "sha256:a88f583d4e0bb643c49743469964103aa59f7f708d862c3ddb0fc07f851e3b8b", size = 56305206 }, +] + +[[package]] +name = "nvidia-cusolver-cu12" +version = "11.6.1.9" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-cublas-cu12", marker = "sys_platform != 'win32'" }, + { name = "nvidia-cusparse-cu12", marker = "sys_platform != 'win32'" }, + { name = "nvidia-nvjitlink-cu12", marker = "sys_platform != 'win32'" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/3a/e1/5b9089a4b2a4790dfdea8b3a006052cfecff58139d5a4e34cb1a51df8d6f/nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl", hash = "sha256:19e33fa442bcfd085b3086c4ebf7e8debc07cfe01e11513cc6d332fd918ac260", size = 127936057 }, +] + +[[package]] +name = "nvidia-cusparse-cu12" +version = "12.3.1.170" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-nvjitlink-cu12", marker = "sys_platform != 'win32'" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/db/f7/97a9ea26ed4bbbfc2d470994b8b4f338ef663be97b8f677519ac195e113d/nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl", hash = "sha256:ea4f11a2904e2a8dc4b1833cc1b5181cde564edd0d5cd33e3c168eff2d1863f1", size = 207454763 }, +] + +[[package]] +name = "nvidia-cusparselt-cu12" +version = "0.6.2" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/78/a8/bcbb63b53a4b1234feeafb65544ee55495e1bb37ec31b999b963cbccfd1d/nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_x86_64.whl", hash = "sha256:df2c24502fd76ebafe7457dbc4716b2fec071aabaed4fb7691a201cde03704d9", size = 150057751 }, +] + +[[package]] +name = "nvidia-nccl-cu12" +version = "2.21.5" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/df/99/12cd266d6233f47d00daf3a72739872bdc10267d0383508b0b9c84a18bb6/nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl", hash = "sha256:8579076d30a8c24988834445f8d633c697d42397e92ffc3f63fa26766d25e0a0", size = 188654414 }, +] + +[[package]] +name = "nvidia-nvjitlink-cu12" +version = "12.4.127" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ff/ff/847841bacfbefc97a00036e0fce5a0f086b640756dc38caea5e1bb002655/nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl", hash = "sha256:06b3b9b25bf3f8af351d664978ca26a16d2c5127dbd53c0497e28d1fb9611d57", size = 21066810 }, +] + +[[package]] +name = "nvidia-nvtx-cu12" +version = "12.4.127" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/87/20/199b8713428322a2f22b722c62b8cc278cc53dffa9705d744484b5035ee9/nvidia_nvtx_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl", hash = "sha256:781e950d9b9f60d8241ccea575b32f5105a5baf4c2351cab5256a24869f12a1a", size = 99144 }, +] + [[package]] name = "packaging" version = "24.2" @@ -1362,6 +1636,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/cf/6c/41c21c6c8af92b9fea313aa47c75de49e2f9a467964ee33eb0135d47eb64/pillow-11.1.0-cp313-cp313t-win_arm64.whl", hash = "sha256:67cd427c68926108778a9005f2a04adbd5e67c442ed21d95389fe1d595458756", size = 2377651 }, ] +[[package]] +name = "pip" +version = "25.0.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/70/53/b309b4a497b09655cb7e07088966881a57d082f48ac3cb54ea729fd2c6cf/pip-25.0.1.tar.gz", hash = "sha256:88f96547ea48b940a3a385494e181e29fb8637898f88d88737c5049780f196ea", size = 1950850 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c9/bc/b7db44f5f39f9d0494071bddae6880eb645970366d0a200022a1a93d57f5/pip-25.0.1-py3-none-any.whl", hash = "sha256:c46efd13b6aa8279f33f2864459c8ce587ea6a1a59ee20de055868d8f7688f7f", size = 1841526 }, +] + [[package]] name = "platformdirs" version = "4.3.7" @@ -1927,6 +2210,39 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/86/62/8d3fc3ec6640161a5649b2cddbbf2b9fa39c92541225b33f117c37c5a2eb/s3transfer-0.11.4-py3-none-any.whl", hash = "sha256:ac265fa68318763a03bf2dc4f39d5cbd6a9e178d81cc9483ad27da33637e320d", size = 84412 }, ] +[[package]] +name = "scikit-learn" +version = "1.6.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "joblib" }, + { name = "numpy" }, + { name = "scipy" }, + { name = "threadpoolctl" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/9e/a5/4ae3b3a0755f7b35a280ac90b28817d1f380318973cff14075ab41ef50d9/scikit_learn-1.6.1.tar.gz", hash = "sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e", size = 7068312 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6c/2a/e291c29670795406a824567d1dfc91db7b699799a002fdaa452bceea8f6e/scikit_learn-1.6.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:72abc587c75234935e97d09aa4913a82f7b03ee0b74111dcc2881cba3c5a7b33", size = 12102620 }, + { url = "https://files.pythonhosted.org/packages/25/92/ee1d7a00bb6b8c55755d4984fd82608603a3cc59959245068ce32e7fb808/scikit_learn-1.6.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:b3b00cdc8f1317b5f33191df1386c0befd16625f49d979fe77a8d44cae82410d", size = 11116234 }, + { url = "https://files.pythonhosted.org/packages/30/cd/ed4399485ef364bb25f388ab438e3724e60dc218c547a407b6e90ccccaef/scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:dc4765af3386811c3ca21638f63b9cf5ecf66261cc4815c1db3f1e7dc7b79db2", size = 12592155 }, + { url = "https://files.pythonhosted.org/packages/a8/f3/62fc9a5a659bb58a03cdd7e258956a5824bdc9b4bb3c5d932f55880be569/scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:25fc636bdaf1cc2f4a124a116312d837148b5e10872147bdaf4887926b8c03d8", size = 13497069 }, + { url = "https://files.pythonhosted.org/packages/a1/a6/c5b78606743a1f28eae8f11973de6613a5ee87366796583fb74c67d54939/scikit_learn-1.6.1-cp311-cp311-win_amd64.whl", hash = "sha256:fa909b1a36e000a03c382aade0bd2063fd5680ff8b8e501660c0f59f021a6415", size = 11139809 }, + { url = "https://files.pythonhosted.org/packages/0a/18/c797c9b8c10380d05616db3bfb48e2a3358c767affd0857d56c2eb501caa/scikit_learn-1.6.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:926f207c804104677af4857b2c609940b743d04c4c35ce0ddc8ff4f053cddc1b", size = 12104516 }, + { url = "https://files.pythonhosted.org/packages/c4/b7/2e35f8e289ab70108f8cbb2e7a2208f0575dc704749721286519dcf35f6f/scikit_learn-1.6.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:2c2cae262064e6a9b77eee1c8e768fc46aa0b8338c6a8297b9b6759720ec0ff2", size = 11167837 }, + { url = "https://files.pythonhosted.org/packages/a4/f6/ff7beaeb644bcad72bcfd5a03ff36d32ee4e53a8b29a639f11bcb65d06cd/scikit_learn-1.6.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1061b7c028a8663fb9a1a1baf9317b64a257fcb036dae5c8752b2abef31d136f", size = 12253728 }, + { url = "https://files.pythonhosted.org/packages/29/7a/8bce8968883e9465de20be15542f4c7e221952441727c4dad24d534c6d99/scikit_learn-1.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2e69fab4ebfc9c9b580a7a80111b43d214ab06250f8a7ef590a4edf72464dd86", size = 13147700 }, + { url = "https://files.pythonhosted.org/packages/62/27/585859e72e117fe861c2079bcba35591a84f801e21bc1ab85bce6ce60305/scikit_learn-1.6.1-cp312-cp312-win_amd64.whl", hash = "sha256:70b1d7e85b1c96383f872a519b3375f92f14731e279a7b4c6cfd650cf5dffc52", size = 11110613 }, + { url = "https://files.pythonhosted.org/packages/2e/59/8eb1872ca87009bdcdb7f3cdc679ad557b992c12f4b61f9250659e592c63/scikit_learn-1.6.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:2ffa1e9e25b3d93990e74a4be2c2fc61ee5af85811562f1288d5d055880c4322", size = 12010001 }, + { url = "https://files.pythonhosted.org/packages/9d/05/f2fc4effc5b32e525408524c982c468c29d22f828834f0625c5ef3d601be/scikit_learn-1.6.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:dc5cf3d68c5a20ad6d571584c0750ec641cc46aeef1c1507be51300e6003a7e1", size = 11096360 }, + { url = "https://files.pythonhosted.org/packages/c8/e4/4195d52cf4f113573fb8ebc44ed5a81bd511a92c0228889125fac2f4c3d1/scikit_learn-1.6.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c06beb2e839ecc641366000ca84f3cf6fa9faa1777e29cf0c04be6e4d096a348", size = 12209004 }, + { url = "https://files.pythonhosted.org/packages/94/be/47e16cdd1e7fcf97d95b3cb08bde1abb13e627861af427a3651fcb80b517/scikit_learn-1.6.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e8ca8cb270fee8f1f76fa9bfd5c3507d60c6438bbee5687f81042e2bb98e5a97", size = 13171776 }, + { url = "https://files.pythonhosted.org/packages/34/b0/ca92b90859070a1487827dbc672f998da95ce83edce1270fc23f96f1f61a/scikit_learn-1.6.1-cp313-cp313-win_amd64.whl", hash = "sha256:7a1c43c8ec9fde528d664d947dc4c0789be4077a3647f232869f41d9bf50e0fb", size = 11071865 }, + { url = "https://files.pythonhosted.org/packages/12/ae/993b0fb24a356e71e9a894e42b8a9eec528d4c70217353a1cd7a48bc25d4/scikit_learn-1.6.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:a17c1dea1d56dcda2fac315712f3651a1fea86565b64b48fa1bc090249cbf236", size = 11955804 }, + { url = "https://files.pythonhosted.org/packages/d6/54/32fa2ee591af44507eac86406fa6bba968d1eb22831494470d0a2e4a1eb1/scikit_learn-1.6.1-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:6a7aa5f9908f0f28f4edaa6963c0a6183f1911e63a69aa03782f0d924c830a35", size = 11100530 }, + { url = "https://files.pythonhosted.org/packages/3f/58/55856da1adec655bdce77b502e94a267bf40a8c0b89f8622837f89503b5a/scikit_learn-1.6.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0650e730afb87402baa88afbf31c07b84c98272622aaba002559b614600ca691", size = 12433852 }, + { url = "https://files.pythonhosted.org/packages/ff/4f/c83853af13901a574f8f13b645467285a48940f185b690936bb700a50863/scikit_learn-1.6.1-cp313-cp313t-win_amd64.whl", hash = "sha256:3f59fe08dc03ea158605170eb52b22a105f238a5d512c4470ddeca71feae8e5f", size = 11337256 }, +] + [[package]] name = "scipy" version = "1.15.2" @@ -1974,6 +2290,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/0a/c8/b3f566db71461cabd4b2d5b39bcc24a7e1c119535c8361f81426be39bb47/scipy-1.15.2-cp313-cp313t-win_amd64.whl", hash = "sha256:fe8a9eb875d430d81755472c5ba75e84acc980e4a8f6204d402849234d3017db", size = 40477705 }, ] +[[package]] +name = "setuptools" +version = "78.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/a9/5a/0db4da3bc908df06e5efae42b44e75c81dd52716e10192ff36d0c1c8e379/setuptools-78.1.0.tar.gz", hash = "sha256:18fd474d4a82a5f83dac888df697af65afa82dec7323d09c3e37d1f14288da54", size = 1367827 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/54/21/f43f0a1fa8b06b32812e0975981f4677d28e0f3271601dc88ac5a5b83220/setuptools-78.1.0-py3-none-any.whl", hash = "sha256:3e386e96793c8702ae83d17b853fb93d3e09ef82ec62722e61da5cd22376dcd8", size = 1256108 }, +] + [[package]] name = "shibuya" version = "2025.3.24" @@ -2225,16 +2550,31 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f1/7b/ce1eafaf1a76852e2ec9b22edecf1daa58175c090266e9f6c64afcd81d91/stack_data-0.6.3-py3-none-any.whl", hash = "sha256:d5558e0c25a4cb0853cddad3d77da9891a08cb85dd9f9f91b9f8cd66e511e695", size = 24521 }, ] +[[package]] +name = "structuregraph-helpers" +version = "0.0.9" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "click" }, + { name = "loguru" }, + { name = "pymatgen" }, + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/78/41/b5fd49d4c5dcdb944d84b7a25f57d7f66ec0e591233d97192e5f9d7d0250/structuregraph_helpers-0.0.9.tar.gz", hash = "sha256:b7e05a080c832c53fc7d48f8a91653c95ec4ea79e5e199dc6d2e3adb6a6eb644", size = 94754 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ab/7c/d74f36987288388272d92740f3895e831350a0d2939bd12b3a112265c57a/structuregraph_helpers-0.0.9-py3-none-any.whl", hash = "sha256:6e872f8b6d0ca0ab8c9f3ae52b2014947f6f2f2d94527619ac9bc7173de21b60", size = 71692 }, +] + [[package]] name = "sympy" -version = "1.13.3" +version = "1.13.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "mpmath" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/11/8a/5a7fd6284fa8caac23a26c9ddf9c30485a48169344b4bd3b0f02fef1890f/sympy-1.13.3.tar.gz", hash = "sha256:b27fd2c6530e0ab39e275fc9b683895367e51d5da91baa8d3d64db2565fec4d9", size = 7533196 } +sdist = { url = "https://files.pythonhosted.org/packages/ca/99/5a5b6f19ff9f083671ddf7b9632028436167cd3d33e11015754e41b249a4/sympy-1.13.1.tar.gz", hash = "sha256:9cebf7e04ff162015ce31c9c6c9144daa34a93bd082f54fd8f12deca4f47515f", size = 7533040 } wheels = [ - { url = "https://files.pythonhosted.org/packages/99/ff/c87e0622b1dadea79d2fb0b25ade9ed98954c9033722eb707053d310d4f3/sympy-1.13.3-py3-none-any.whl", hash = "sha256:54612cf55a62755ee71824ce692986f23c88ffa77207b30c1368eda4a7060f73", size = 6189483 }, + { url = "https://files.pythonhosted.org/packages/b2/fe/81695a1aa331a842b582453b605175f419fe8540355886031328089d840a/sympy-1.13.1-py3-none-any.whl", hash = "sha256:db36cdc64bf61b9b24578b6f7bab1ecdd2452cf008f34faa33776680c26d66f8", size = 6189177 }, ] [[package]] @@ -2246,6 +2586,57 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f", size = 35252 }, ] +[[package]] +name = "threadpoolctl" +version = "3.6.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b7/4d/08c89e34946fce2aec4fbb45c9016efd5f4d7f24af8e5d93296e935631d8/threadpoolctl-3.6.0.tar.gz", hash = "sha256:8ab8b4aa3491d812b623328249fab5302a68d2d71745c8a4c719a2fcaba9f44e", size = 21274 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638 }, +] + +[[package]] +name = "torch" +version = "2.6.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "filelock" }, + { name = "fsspec" }, + { name = "jinja2" }, + { name = "networkx" }, + { name = "nvidia-cublas-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cuda-cupti-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cuda-nvrtc-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cuda-runtime-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cudnn-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cufft-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-curand-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cusolver-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cusparse-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cusparselt-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-nccl-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-nvjitlink-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-nvtx-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "setuptools", marker = "python_full_version >= '3.12'" }, + { name = "sympy" }, + { name = "triton", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "typing-extensions" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/78/a9/97cbbc97002fff0de394a2da2cdfa859481fdca36996d7bd845d50aa9d8d/torch-2.6.0-cp311-cp311-manylinux1_x86_64.whl", hash = "sha256:7979834102cd5b7a43cc64e87f2f3b14bd0e1458f06e9f88ffa386d07c7446e1", size = 766715424 }, + { url = "https://files.pythonhosted.org/packages/6d/fa/134ce8f8a7ea07f09588c9cc2cea0d69249efab977707cf67669431dcf5c/torch-2.6.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:ccbd0320411fe1a3b3fec7b4d3185aa7d0c52adac94480ab024b5c8f74a0bf1d", size = 95759416 }, + { url = "https://files.pythonhosted.org/packages/11/c5/2370d96b31eb1841c3a0883a492c15278a6718ccad61bb6a649c80d1d9eb/torch-2.6.0-cp311-cp311-win_amd64.whl", hash = "sha256:46763dcb051180ce1ed23d1891d9b1598e07d051ce4c9d14307029809c4d64f7", size = 204164970 }, + { url = "https://files.pythonhosted.org/packages/0b/fa/f33a4148c6fb46ca2a3f8de39c24d473822d5774d652b66ed9b1214da5f7/torch-2.6.0-cp311-none-macosx_11_0_arm64.whl", hash = "sha256:94fc63b3b4bedd327af588696559f68c264440e2503cc9e6954019473d74ae21", size = 66530713 }, + { url = "https://files.pythonhosted.org/packages/e5/35/0c52d708144c2deb595cd22819a609f78fdd699b95ff6f0ebcd456e3c7c1/torch-2.6.0-cp312-cp312-manylinux1_x86_64.whl", hash = "sha256:2bb8987f3bb1ef2675897034402373ddfc8f5ef0e156e2d8cfc47cacafdda4a9", size = 766624563 }, + { url = "https://files.pythonhosted.org/packages/01/d6/455ab3fbb2c61c71c8842753b566012e1ed111e7a4c82e0e1c20d0c76b62/torch-2.6.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:b789069020c5588c70d5c2158ac0aa23fd24a028f34a8b4fcb8fcb4d7efcf5fb", size = 95607867 }, + { url = "https://files.pythonhosted.org/packages/18/cf/ae99bd066571656185be0d88ee70abc58467b76f2f7c8bfeb48735a71fe6/torch-2.6.0-cp312-cp312-win_amd64.whl", hash = "sha256:7e1448426d0ba3620408218b50aa6ada88aeae34f7a239ba5431f6c8774b1239", size = 204120469 }, + { url = "https://files.pythonhosted.org/packages/81/b4/605ae4173aa37fb5aa14605d100ff31f4f5d49f617928c9f486bb3aaec08/torch-2.6.0-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:9a610afe216a85a8b9bc9f8365ed561535c93e804c2a317ef7fabcc5deda0989", size = 66532538 }, + { url = "https://files.pythonhosted.org/packages/24/85/ead1349fc30fe5a32cadd947c91bda4a62fbfd7f8c34ee61f6398d38fb48/torch-2.6.0-cp313-cp313-manylinux1_x86_64.whl", hash = "sha256:4874a73507a300a5d089ceaff616a569e7bb7c613c56f37f63ec3ffac65259cf", size = 766626191 }, + { url = "https://files.pythonhosted.org/packages/dd/b0/26f06f9428b250d856f6d512413e9e800b78625f63801cbba13957432036/torch-2.6.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a0d5e1b9874c1a6c25556840ab8920569a7a4137afa8a63a32cee0bc7d89bd4b", size = 95611439 }, + { url = "https://files.pythonhosted.org/packages/c2/9c/fc5224e9770c83faed3a087112d73147cd7c7bfb7557dcf9ad87e1dda163/torch-2.6.0-cp313-cp313-win_amd64.whl", hash = "sha256:510c73251bee9ba02ae1cb6c9d4ee0907b3ce6020e62784e2d7598e0cfa4d6cc", size = 204126475 }, + { url = "https://files.pythonhosted.org/packages/88/8b/d60c0491ab63634763be1537ad488694d316ddc4a20eaadd639cedc53971/torch-2.6.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:ff96f4038f8af9f7ec4231710ed4549da1bdebad95923953a25045dcf6fd87e2", size = 66536783 }, +] + [[package]] name = "tqdm" version = "4.67.1" @@ -2267,6 +2658,16 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/00/c0/8f5d070730d7836adc9c9b6408dec68c6ced86b304a9b26a14df072a6e8c/traitlets-5.14.3-py3-none-any.whl", hash = "sha256:b74e89e397b1ed28cc831db7aea759ba6640cb3de13090ca145426688ff1ac4f", size = 85359 }, ] +[[package]] +name = "triton" +version = "3.2.0" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a7/2e/757d2280d4fefe7d33af7615124e7e298ae7b8e3bc4446cdb8e88b0f9bab/triton-3.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8009a1fb093ee8546495e96731336a33fb8856a38e45bb4ab6affd6dbc3ba220", size = 253157636 }, + { url = "https://files.pythonhosted.org/packages/06/00/59500052cb1cf8cf5316be93598946bc451f14072c6ff256904428eaf03c/triton-3.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8d9b215efc1c26fa7eefb9a157915c92d52e000d2bf83e5f69704047e63f125c", size = 253159365 }, + { url = "https://files.pythonhosted.org/packages/c7/30/37a3384d1e2e9320331baca41e835e90a3767303642c7a80d4510152cbcf/triton-3.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5dfa23ba84541d7c0a531dfce76d8bcd19159d50a4a8b14ad01e91734a5c1b0", size = 253154278 }, +] + [[package]] name = "typing-extensions" version = "4.13.0" @@ -2326,6 +2727,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/fd/84/fd2ba7aafacbad3c4201d395674fc6348826569da3c0937e75505ead3528/wcwidth-0.2.13-py2.py3-none-any.whl", hash = "sha256:3da69048e4540d84af32131829ff948f1e022c1c6bdb8d6102117aac784f6859", size = 34166 }, ] +[[package]] +name = "win32-setctime" +version = "1.2.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b3/8f/705086c9d734d3b663af0e9bb3d4de6578d08f46b1b101c2442fd9aecaa2/win32_setctime-1.2.0.tar.gz", hash = "sha256:ae1fdf948f5640aae05c511ade119313fb6a30d7eabe25fef9764dca5873c4c0", size = 4867 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e1/07/c6fe3ad3e685340704d314d765b7912993bcb8dc198f0e7a89382d37974b/win32_setctime-1.2.0-py3-none-any.whl", hash = "sha256:95d644c4e708aba81dc3704a116d8cbc974d70b3bdb8be1d150e36be6e9d1390", size = 4083 }, +] + [[package]] name = "xxhash" version = "3.5.0"