ENH: Use pydantic models to represent ESPEI datasets #269

bocklund · 2025-08-18T02:38:15Z

Implements pydantic models for the currently supported datasets. This should make it easier to implement new types of datasets that conform to normal expectations, make the datasets more usable when implementing residual functions, and provide a sane path for refactoring/removing/deprecating dataset types if that's ever needed*.

In this PR, we only use pydantic for validation within the dataset loader (rather than check_dataset and clean_dataset which are now deprecated to be removed in ESPEI 0.11 and those validations were migrated to the pydantic models). For now, we don't use the pydantic objects anywhere else in the code. New code should use the pydantic objects instead of the arbitrary dictionary representations, and existing code should migrate as they are updated.

Some things to do before merging:

check existing datasets against strict mode to make sure we didn't miss anything. Strict mode probably is not something we will be using in production to ease development and allow users to have their own arbitrary comment, etc. keys
try to make some purposely faulty datasets - are the error messages useful? Can we make them more useful? Does adding Field(..., description="...") help at all?
(it would be nice to) update the web documentation for datasets to include typed versions. Ideally we could autogenerate typed schemas with descriptions, but I am skeptical of whether something autogenerated would be more human readable. The docs could perhaps be shortened with the typed version with explanations of the fields, then give several examples rather than the long and mixed text that is there now, favoring more copy-pastable examples than the long prose that is there.

*Eventually I'd like to:

merge activity activity into equilibrium property datasets
refactor fixed configuration datasets to use more uniform configuration and occupancies, that is: don't allow sublattices with a single occupant to not be surrounded by braces, i.e. change the type from list[list[float | list[float]] to just list[list[list[float]])
reconsider how broadcasting works in general

Currently we just pass through the models and dump them out to dicts to put in the PickleableTinyDB

Activity-specific checks weren't done at all and it's all subsumed by equilibrium

…tic models Added some new tests that were previously uncovered

- Add __all__ for datasets - implement to_Dataset - deprecate check_dataset

…implement a value and the validators

bocklund added 14 commits August 17, 2025 12:40

Implement pydantic models for datasets

0f813bc

Currently we just pass through the models and dump them out to dicts to put in the PickleableTinyDB

Migrate ZPF-specific check_dataset functions to ZPFDataset validator

b48213c

Deprecate clean_dataset as the behavior is in pydantic now

fb9cc5d

Fix max length

b3d8e9b

Migrate check_datasets validators to pydantic models

5a0b887

Cleanup of activity check_dataset stuff

d2f6427

Activity-specific checks weren't done at all and it's all subsumed by equilibrium

Migrate equlibrium and activity check_datasets functionality to pydan…

a3610fe

…tic models Added some new tests that were previously uncovered

Delete recursive_map as dead code

1fa6724

Ensure tags are present in the dataset models

c31d9ae

Multiple dataset cleanups:

8a9d38f

- Add __all__ for datasets - implement to_Dataset - deprecate check_dataset

Refactor modules back to simple datasets module

02299fd

Ruff check datasets.py

42a6e3a

Delete comment field

b01668a

Refactor types to common datasets. Datasets pretty much only have to …

2ffd7d6

…implement a value and the validators

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Use pydantic models to represent ESPEI datasets #269

ENH: Use pydantic models to represent ESPEI datasets #269

Uh oh!

bocklund commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

ENH: Use pydantic models to represent ESPEI datasets #269

Are you sure you want to change the base?

ENH: Use pydantic models to represent ESPEI datasets #269

Uh oh!

Conversation

bocklund commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant