-
Notifications
You must be signed in to change notification settings - Fork 2
Add FileContainer search_intersection
method
#169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add FileContainer search_intersection
method
#169
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Would it make sense to require two import filefisher
test_paths = [
"historical/tas",
"historical/hfds",
"ssp585/hfds",
]
ff = filefisher.Filefinder("{scen}", "{variable}", test_paths=test_paths)
fc_tas = ff.find_files(variable="tas")
fc_hfds = ff.find_files(variable="hfds")
fc_tas.intersect(fc_hfds, on="variable") where the The implementation could be along the lines of import pandas as pd
def intersect(df_l: pd.DataFrame, df_r: pd.DataFrame, on: str):
assert (df_l.columns == df_r.columns).all()
assert len(df_l[on].unique()) == 1
assert len(df_r[on].unique()) == 1
columns = df_l.columns.drop(on)
mi_l = pd.MultiIndex.from_frame(df_l[columns])
mi_r = pd.MultiIndex.from_frame(df_r[columns])
sel = mi_l.intersection(mi_r)
sel_l = mi_l.get_locs(sel)
sel_r = mi_r.get_locs(sel)
l = df_l.iloc[sel_l]
r = df_r.iloc[sel_r]
return pd.concat([l, r])
intersect(fc_tas.df, fc_hfds.df, on="variable") |
Right. That is also nice. My application was one where if did Yours is easier to understand. I am thinking about possible advantages of my implementation... With mine only the entries of the |
Yet another idea would be to combine this as a filefisher.align(fc.grouby(on="variable"), except="variable") but then we have to pass the |
Here is a method that enables searching for intersecting values of a certain key along all values of another key. The specific use case here is: "Which scenarios and members are available for both variables tas and hfds" for example.
The usage would be to find all available scenarios and members for both variables and then search the resulting
FileContainer
for intersecting values alongscenario_member
. In this casesearch_key = variable
andintersect_key = scenario_member
. I chose this approach because I felt it relatively straight forward, more than implementing it inFileFinder
.I am not very happy with the names and my explanation in the docstring, but at least the example should make it quite clear.