-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting rid of data point if thermophysical data is not included #578
Comments
I ran into this too, it would be convenient for this to happen automatically in |
@barmoral Thanks for providing a reproduction I can easily get started on. How long does this script take to run, though? It's been a few minutes (probably just fetching the data?) and I want to make sure that's not surprising |
Okay, it finished. I was just a little impatient. What columns should we drop rows based off of? This dataframe has plenty of missing pressure data, but no missing temperature or phase data. Some other columns are always missing so we can't just call
My guess is we want to consider pressure, temperature, and phase. For this data, it strips out some but not most of the dataset:
But I wonder if you also want rows stripped out if density or osmotic coefficient (, ...) are missing? |
@mattwthompson Thanks for checking this out! No, I don't mind if density or osmotic coefficients are missing. If it is possible, it would just be helpful that the code runs even if there is data missing and takes into consideration the data that is actually there, instead of deleting the whole data point. If not possible, maybe let you know which data points are missing data and therefore will be thrown out when filtering for a specific property. |
Is your feature request related to a problem? Please describe.
When using ThermoML dois as input data in evaluator for filtering, sometimes there are no values for pressure or temperature. Because evaluator expects this thermodynamic properties, loading and/or filtering data will rise an error. The error basically arises from the fact that every value of pressure (for example) in every row is getting turned into a physical property object, and if there are no values there, then the code breaks.
Describe the solution you'd like
It would be better that evaluator removes these data points without complete thermodynamic data automatically before the code breaks, or make evaluator accept these with a warning.
Describe alternatives you've considered
I manually removed the data points without complete thermodynamic data by using dropna().
Additional context
I attach to this issue an input json file (sorted_dois.json)
Here is the example python code to replicate the error:
The text was updated successfully, but these errors were encountered: