Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add main functions for the computation of Kr maps #866

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

carhc
Copy link
Collaborator

@carhc carhc commented Mar 8, 2024

This PR addresses issue #863 (ICAROS: Calibration map creation) and lies on top of PR #865 (Data preparation functions).

It contains the main and auxiliary functions needed for the core of Icaro city. Many different functions have been defined in order to prepare, fit and produce the Kr calibration maps.

WIP:

  • Complete tests
  • Define regularization.
  • Define how to select map bins that fall within the detector volume (maybe using DB).
  • Fix the prescription for the p-value calculation.
  • Cosmetics and unify style. Many of the functions here have been changing almost daily for the last week (even today!), so there may still be inconsistencies, especially regarding the functions message for parameters and returns.

@jahernando
Copy link
Collaborator

jahernando commented Apr 4, 2024

Comments & Corrections:

  1. get_number_of_bins ok
  2. get_binned_data what is does is: get_bin_counts_and_event_bin_id, change name!
  3. update_dst extend: update_dst_with_event_bin_id
  4. in calculate_residuals line 611, it is missing KrFitFunction in lin_function?
  5. fit_and_fill_map, I think the try is not necessary.
  6. fit_and_fill_map, I will remove the else associated with if not map_bin['in_active'] or not map_bin['has_min_counts']:, and put the rest of the code (most of the function body) in the first identation level.
  7. fit_and_fill_map, L736, name = get_par_name_from_fittype(fittype = fittype), you are transforming the fit paramters into (E0, LT), so this is not necessary [it is already solved in a posterior commit!].
  8. icaro city is not commented, you need to indicate the meaning of the parameters, in particular what is nStimeprofile
  9. icaro city, I understand it is still a draft, but nevertheless, we should add a check function after the map_builder and map_evol main functions.
  10. icaro city, L80, L81, about the sinks, what are the arguments: "pointlike_event", "pmap_passed"
  11. compute_map, has some hardwired values, this is not acceptable, these parameters should be in a configuration file, even if they are almost never modified [Some modifications done in a later commit].
  12. What is the decision about computing the chi2?
  13. Most of the functions have not an associated test!

Tests:

  • ok get_number_of_bins

@jahernando
Copy link
Collaborator

This is a large commit, better do small commits with specific changes. Otherwise is difficult to follow the flow of changes.

I understant that:

  • some (empty) functions are introduced in the main flow to check the map and the map evolution internally
  • some functions have been renamed to gain in clarity
  • some arguments are not passed by default to gain robustness

Do I miss other changes?

There is a comment of many lines related with time-series, I understant this is the time-evolution part, that will be re-adapted in a new PR.

@carhc Are you preparing to commit more tests of the functions used in the code?

@bpalmeiro bpalmeiro linked an issue Oct 18, 2024 that may be closed by this pull request
6 tasks
@carhc carhc marked this pull request as draft December 30, 2024 10:14
@carhc carhc force-pushed the map_components branch 2 times, most recently from 063b20c to c7606ed Compare December 30, 2024 10:34
Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being done with peras

invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
invisible_cities/reco/krmap_functions.py Outdated Show resolved Hide resolved
Comment on lines 172 to 173
fittype : KrFitFunction
Chosen fit function for map computation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer a parameter. Remove blank line before.

fittype : KrFitFunction
Chosen fit function for map computation
bins : Tuple[np.array, np.array]
Tuple containing bins in both axis
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tuple containing bins in both axis
Tuple containing bins in both axes


Returns
-------

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

Comment on lines 196 to 197
geom_comb = list(itertools.product(b_center[0], b_center[1]))
r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
geom_comb = list(itertools.product(b_center[0], b_center[1]))
r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb])
geom_comb = np.array(list(itertools.product(*b_center))
r_values = np.sum(geom_comb**2, axis=1)**0.5

Comment on lines 201 to 202
X = list(zip(*geom_comb))[0],
Y = list(zip(*geom_comb))[1],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the suggestion above...

Suggested change
X = list(zip(*geom_comb))[0],
Y = list(zip(*geom_comb))[1],
X = geom_comb.T[0],
Y = geom_comb.T[1],

Comment on lines 132 to 140
@given(integers(min_value = 0, max_value = 1e10),
integers(min_value = 0, max_value = 1e10),
arrays (dtype = np.int64, shape = (2,),
elements = integers(min_value = 1,
max_value = 1e4)))
def test_get_number_of_bins_returns_type(nevents, thr, n_bins):

assert type(krf.get_number_of_bins(nevents, thr) ) == np.ndarray
assert type(krf.get_number_of_bins(n_bins=n_bins)) == np.ndarray
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove hypothesis and do a simple fixed cases. Why do we want to test the output type?

Comment on lines 143 to 147
@given(n_bins=arrays(dtype = int, shape = (2,),
elements = integers(min_value = 2,
max_value = 100)),
n_min=integers(min_value=1, max_value=100),
r_max=floats (min_value=50, max_value=450))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, a single case is enough.

'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y']

assert all(element in columns for element in df.columns.values)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check also the number of rows in the output and rename the test to something like test_create_df_kr_map_shape.

Also, add another test checking that the non-dummy values set in the function fall in the expected range.

Comment on lines +382 to +383
def valid_bin_counter(map_df : pd.DataFrame,
validity_parameter : Optional[float] = 0.9):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, need tests for this function

Comment on lines 454 to 455
def regularize_map(maps : pd.DataFrame,
x2range : Tuple[float, float]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests. At least:

  • check it doesn't modify the input
  • create a silly map with ones and a few "invalid entries", then check the invalid entries are restored to 1.

Copy link
Collaborator

@gonzaponte gonzaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments.

Comment on lines +305 to +310
counts = counts.flatten()
bin_indexes -= 1
bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins),
mode='clip', order = 'F')

return counts, bin_labels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
counts = counts.flatten()
bin_indexes -= 1
bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins),
mode='clip', order = 'F')
return counts, bin_labels
# indexes 0 and len+1 represent underflow and overflow, thus we subtract 1
bin_labels = np.ravel_multi_index(bin_indexes - 1, dims=(n_xbins, n_ybins),
mode='clip', order = 'F')
return counts.flatten(), bin_labels

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly mode="clip" means that bins out of range will be assigned the closest bin. Is this what we used to do @bpalmeiro ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case, we had three values for "fiducial":

  • The R max for the selection (just a quality one to ensure everything was inside)

  • The R max for the map, that defines the maximum extension of it. In this case, if the event didn't happen to fall inside the bins, it was disregarded.

  • Also (not sure here it's the most appropriate place to mention, tho), we had a posterior step where the peripheral bins (those further than a fiducial r -but within said Rmax for map production-) were set to nan.

The latter was used to ensure that, in the posterior analysis, all the hits reconstructed outside the volume were automatically flagged as nan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the best approach would be to disregard events outside boundaries instead of fakely merging them in the border bins, even if we rely on a previous selection. It's not a very strong opinion, but I think it adds redundancy (and, therefore, a bit of robustness) to the process.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my bad... I assumed that there would not be any events outside the boundaries here since I was expecting a "clean" dst in this part of the code, so when I defined bin_labels like that I didn't think of the consequences in case some event was actually out of the range.

I am thinking in a solution like this one:

Suggested change
counts = counts.flatten()
bin_indexes -= 1
bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins),
mode='clip', order = 'F')
return counts, bin_labels
counts = counts.flatten()
bin_indexes -= 1
valid_mask = ( in_range(bin_indexes[0], 0, n_xbins, right_closed=True) &
in_range(bin_indexes[1], 0, n_ybins, right_closed=True) )
bin_labels = np.full(bin_indexes.shape[1], fill_value=np.nan)
bin_labels[valid_mask] = np.ravel_multi_index((bin_indexes[0, valid_mask],
bin_indexes[1, valid_mask]),
dims=(n_xbins, n_ybins),
order='F')

In case some events are outside the desired range, their label would be a NaN instead of a numerical index, so when grouping the events bin by bin, none would be assigned to those events...

return valid_per


def fit_and_fill_map(map_bin : pd.DataFrame,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is a pd.Series, not a pd.DataFrame

Comment on lines +471 to +474
outliers = new_map.in_active & ~new_map.valid

if isinstance(x2range, Tuple):
outliers &= ~in_range(new_map.chi2, *x2range)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is wrong here. The outliers should be those that are in the active and (not valid OR not in range). Truth table:

in_active  valid   x2_in_range  outlier  expected
--------------------------------------------------
True       True    True         False    False   <--- satisfied
True       True    False        False    True    <--- not satisfied
True       False   True         False    True    <--- not satisfied
True       False   False        True     True    <--- satisfied
False      x       x            False    False   <--- satisfied

'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y']

assert all(element in columns for element in df.columns.values)
assert df.bin.nunique() == n_bins_x*n_bins_y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the length od the dataframe be also n_binx_x*n_bins_y?

Comment on lines +207 to +215
def test_valid_bin_counter_warning(n_bins, rmax, validity_parameter):
counts = np.array(range(n_bins[0]*n_bins[1]))
krmap = krf.create_df_kr_map(bins_x = np.linspace(-rmax, +rmax, n_bins[0]+1),
bins_y = np.linspace(-rmax, +rmax, n_bins[1]+1),
counts = counts,
n_min = 0,
r_max = np.nextafter(np.sqrt(2)*rmax, np.inf))

krmap.valid.iloc[0 : 9] = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, simplify the data in this test

Comment on lines +217 to +219
if validity_parameter == 1:
with warns(UserWarning, match = "inner bins are not valid."):
krf.valid_bin_counter(map_df = krmap, validity_parameter = validity_parameter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for validity_parameter = 0.2, you are not checking anything. Check this: https://stackoverflow.com/a/45671804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ICAROS: calibration map creator
4 participants