-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add main functions for the computation of Kr maps #866
base: master
Are you sure you want to change the base?
Conversation
Comments & Corrections:
Tests:
|
This is a large commit, better do small commits with specific changes. Otherwise is difficult to follow the flow of changes. I understant that:
Do I miss other changes? There is a comment of many lines related with time-series, I understant this is the time-evolution part, that will be re-adapted in a new PR. @carhc Are you preparing to commit more tests of the functions used in the code? |
cb58021
to
f1c6ab6
Compare
063b20c
to
c7606ed
Compare
c7606ed
to
aedc73e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being done with peras
fittype : KrFitFunction | ||
Chosen fit function for map computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer a parameter. Remove blank line before.
fittype : KrFitFunction | ||
Chosen fit function for map computation | ||
bins : Tuple[np.array, np.array] | ||
Tuple containing bins in both axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple containing bins in both axis | |
Tuple containing bins in both axes |
|
||
Returns | ||
------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
geom_comb = list(itertools.product(b_center[0], b_center[1])) | ||
r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geom_comb = list(itertools.product(b_center[0], b_center[1])) | |
r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb]) | |
geom_comb = np.array(list(itertools.product(*b_center)) | |
r_values = np.sum(geom_comb**2, axis=1)**0.5 |
X = list(zip(*geom_comb))[0], | ||
Y = list(zip(*geom_comb))[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the suggestion above...
X = list(zip(*geom_comb))[0], | |
Y = list(zip(*geom_comb))[1], | |
X = geom_comb.T[0], | |
Y = geom_comb.T[1], |
@given(integers(min_value = 0, max_value = 1e10), | ||
integers(min_value = 0, max_value = 1e10), | ||
arrays (dtype = np.int64, shape = (2,), | ||
elements = integers(min_value = 1, | ||
max_value = 1e4))) | ||
def test_get_number_of_bins_returns_type(nevents, thr, n_bins): | ||
|
||
assert type(krf.get_number_of_bins(nevents, thr) ) == np.ndarray | ||
assert type(krf.get_number_of_bins(n_bins=n_bins)) == np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hypothesis and do a simple fixed cases. Why do we want to test the output type?
@given(n_bins=arrays(dtype = int, shape = (2,), | ||
elements = integers(min_value = 2, | ||
max_value = 100)), | ||
n_min=integers(min_value=1, max_value=100), | ||
r_max=floats (min_value=50, max_value=450)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, a single case is enough.
'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y'] | ||
|
||
assert all(element in columns for element in df.columns.values) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check also the number of rows in the output and rename the test to something like test_create_df_kr_map_shape
.
Also, add another test checking that the non-dummy values set in the function fall in the expected range.
def valid_bin_counter(map_df : pd.DataFrame, | ||
validity_parameter : Optional[float] = 0.9): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, need tests for this function
def regularize_map(maps : pd.DataFrame, | ||
x2range : Tuple[float, float]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests. At least:
- check it doesn't modify the input
- create a silly map with ones and a few "invalid entries", then check the invalid entries are restored to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more comments.
counts = counts.flatten() | ||
bin_indexes -= 1 | ||
bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins), | ||
mode='clip', order = 'F') | ||
|
||
return counts, bin_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counts = counts.flatten() | |
bin_indexes -= 1 | |
bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins), | |
mode='clip', order = 'F') | |
return counts, bin_labels | |
# indexes 0 and len+1 represent underflow and overflow, thus we subtract 1 | |
bin_labels = np.ravel_multi_index(bin_indexes - 1, dims=(n_xbins, n_ybins), | |
mode='clip', order = 'F') | |
return counts.flatten(), bin_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand correctly mode="clip"
means that bins out of range will be assigned the closest bin. Is this what we used to do @bpalmeiro ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case, we had three values for "fiducial":
-
The R max for the selection (just a quality one to ensure everything was inside)
-
The R max for the map, that defines the maximum extension of it. In this case, if the event didn't happen to fall inside the bins, it was disregarded.
-
Also (not sure here it's the most appropriate place to mention, tho), we had a posterior step where the peripheral bins (those further than a fiducial r -but within said Rmax for map production-) were set to nan.
The latter was used to ensure that, in the posterior analysis, all the hits reconstructed outside the volume were automatically flagged as nan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, the best approach would be to disregard events outside boundaries instead of fakely merging them in the border bins, even if we rely on a previous selection. It's not a very strong opinion, but I think it adds redundancy (and, therefore, a bit of robustness) to the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my bad... I assumed that there would not be any events outside the boundaries here since I was expecting a "clean" dst
in this part of the code, so when I defined bin_labels
like that I didn't think of the consequences in case some event was actually out of the range.
I am thinking in a solution like this one:
counts = counts.flatten() | |
bin_indexes -= 1 | |
bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins), | |
mode='clip', order = 'F') | |
return counts, bin_labels | |
counts = counts.flatten() | |
bin_indexes -= 1 | |
valid_mask = ( in_range(bin_indexes[0], 0, n_xbins, right_closed=True) & | |
in_range(bin_indexes[1], 0, n_ybins, right_closed=True) ) | |
bin_labels = np.full(bin_indexes.shape[1], fill_value=np.nan) | |
bin_labels[valid_mask] = np.ravel_multi_index((bin_indexes[0, valid_mask], | |
bin_indexes[1, valid_mask]), | |
dims=(n_xbins, n_ybins), | |
order='F') |
In case some events are outside the desired range, their label would be a NaN
instead of a numerical index, so when grouping the events bin by bin, none would be assigned to those events...
return valid_per | ||
|
||
|
||
def fit_and_fill_map(map_bin : pd.DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this is a pd.Series
, not a pd.DataFrame
outliers = new_map.in_active & ~new_map.valid | ||
|
||
if isinstance(x2range, Tuple): | ||
outliers &= ~in_range(new_map.chi2, *x2range) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic is wrong here. The outliers should be those that are in the active and (not valid OR not in range). Truth table:
in_active valid x2_in_range outlier expected
--------------------------------------------------
True True True False False <--- satisfied
True True False False True <--- not satisfied
True False True False True <--- not satisfied
True False False True True <--- satisfied
False x x False False <--- satisfied
'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y'] | ||
|
||
assert all(element in columns for element in df.columns.values) | ||
assert df.bin.nunique() == n_bins_x*n_bins_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the length od the dataframe be also n_binx_x*n_bins_y
?
def test_valid_bin_counter_warning(n_bins, rmax, validity_parameter): | ||
counts = np.array(range(n_bins[0]*n_bins[1])) | ||
krmap = krf.create_df_kr_map(bins_x = np.linspace(-rmax, +rmax, n_bins[0]+1), | ||
bins_y = np.linspace(-rmax, +rmax, n_bins[1]+1), | ||
counts = counts, | ||
n_min = 0, | ||
r_max = np.nextafter(np.sqrt(2)*rmax, np.inf)) | ||
|
||
krmap.valid.iloc[0 : 9] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, simplify the data in this test
if validity_parameter == 1: | ||
with warns(UserWarning, match = "inner bins are not valid."): | ||
krf.valid_bin_counter(map_df = krmap, validity_parameter = validity_parameter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for validity_parameter = 0.2
, you are not checking anything. Check this: https://stackoverflow.com/a/45671804
This PR addresses issue #863 (ICAROS: Calibration map creation) and lies on top of PR #865 (Data preparation functions).
It contains the main and auxiliary functions needed for the core of Icaro city. Many different functions have been defined in order to prepare, fit and produce the Kr calibration maps.
WIP: