Add main functions for the computation of Kr maps #866

carhc · 2024-03-08T17:25:16Z

This PR addresses issue #863 (ICAROS: Calibration map creation) and lies on top of PR #865 (Data preparation functions).

It contains the main and auxiliary functions needed for the core of Icaro city. Many different functions have been defined in order to prepare, fit and produce the Kr calibration maps.

WIP:

Complete tests
Define regularization.
Define how to select map bins that fall within the detector volume (maybe using DB).
Fix the prescription for the p-value calculation.
Cosmetics and unify style. Many of the functions here have been changing almost daily for the last week (even today!), so there may still be inconsistencies, especially regarding the functions message for parameters and returns.

jahernando · 2024-04-04T15:39:56Z

Comments & Corrections:

get_number_of_bins ok
get_binned_data what is does is: get_bin_counts_and_event_bin_id, change name!
update_dst extend: update_dst_with_event_bin_id
in calculate_residuals line 611, it is missing KrFitFunction in lin_function?
fit_and_fill_map, I think the try is not necessary.
fit_and_fill_map, I will remove the else associated with if not map_bin['in_active'] or not map_bin['has_min_counts']:, and put the rest of the code (most of the function body) in the first identation level.
fit_and_fill_map, L736, name = get_par_name_from_fittype(fittype = fittype), you are transforming the fit paramters into (E0, LT), so this is not necessary [it is already solved in a posterior commit!].
icaro city is not commented, you need to indicate the meaning of the parameters, in particular what is nStimeprofile
icaro city, I understand it is still a draft, but nevertheless, we should add a check function after the map_builder and map_evol main functions.
icaro city, L80, L81, about the sinks, what are the arguments: "pointlike_event", "pmap_passed"
compute_map, has some hardwired values, this is not acceptable, these parameters should be in a configuration file, even if they are almost never modified [Some modifications done in a later commit].
What is the decision about computing the chi2?
Most of the functions have not an associated test!

Tests:

ok get_number_of_bins

jahernando · 2024-05-14T13:45:29Z

This is a large commit, better do small commits with specific changes. Otherwise is difficult to follow the flow of changes.

I understant that:

some (empty) functions are introduced in the main flow to check the map and the map evolution internally
some functions have been renamed to gain in clarity
some arguments are not passed by default to gain robustness

Do I miss other changes?

There is a comment of many lines related with time-series, I understant this is the time-evolution part, that will be re-adapted in a new PR.

@carhc Are you preparing to commit more tests of the functions used in the code?

gonzaponte

This is being done with peras

invisible_cities/reco/krmap_functions.py

gonzaponte · 2025-01-09T10:29:20Z

invisible_cities/reco/krmap_functions.py

+    fittype : KrFitFunction
+        Chosen fit function for map computation


This is no longer a parameter. Remove blank line before.

gonzaponte · 2025-01-09T10:31:08Z

invisible_cities/reco/krmap_functions.py

+    fittype : KrFitFunction
+        Chosen fit function for map computation
+    bins : Tuple[np.array, np.array]
+        Tuple containing bins in both axis


Suggested change

Tuple containing bins in both axis

Tuple containing bins in both axes

gonzaponte · 2025-01-09T10:31:21Z

invisible_cities/reco/krmap_functions.py

+
+    Returns
+    -------
+


gonzaponte · 2025-01-09T10:36:29Z

invisible_cities/reco/krmap_functions.py

+    geom_comb = list(itertools.product(b_center[0], b_center[1]))
+    r_values  = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb])


Suggested change

geom_comb = list(itertools.product(b_center[0], b_center[1]))

r_values = np.array([np.sqrt(x**2+y**2)for x, y in geom_comb])

geom_comb = np.array(list(itertools.product(*b_center))

r_values = np.sum(geom_comb**2, axis=1)**0.5

gonzaponte · 2025-01-09T10:37:20Z

invisible_cities/reco/krmap_functions.py

+                                  X              = list(zip(*geom_comb))[0],
+                                  Y              = list(zip(*geom_comb))[1],


Following the suggestion above...

Suggested change

X = list(zip(*geom_comb))[0],

Y = list(zip(*geom_comb))[1],

X = geom_comb.T[0],

Y = geom_comb.T[1],

gonzaponte · 2025-01-09T11:05:59Z

invisible_cities/reco/krmap_functions_test.py

+@given(integers(min_value = 0, max_value = 1e10),
+       integers(min_value = 0, max_value = 1e10),
+       arrays  (dtype = np.int64,  shape = (2,),
+                elements = integers(min_value = 1,
+                                    max_value = 1e4)))
+def test_get_number_of_bins_returns_type(nevents, thr, n_bins):
+
+    assert type(krf.get_number_of_bins(nevents, thr) ) == np.ndarray
+    assert type(krf.get_number_of_bins(n_bins=n_bins)) == np.ndarray


Remove hypothesis and do a simple fixed cases. Why do we want to test the output type?

gonzaponte · 2025-01-09T11:07:32Z

invisible_cities/reco/krmap_functions_test.py

+@given(n_bins=arrays(dtype = int,      shape = (2,),
+                     elements = integers(min_value = 2,
+                                         max_value = 100)),
+       n_min=integers(min_value=1,  max_value=100),
+       r_max=floats  (min_value=50, max_value=450))


again, a single case is enough.

gonzaponte · 2025-01-09T11:08:51Z

invisible_cities/reco/krmap_functions_test.py

+                'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y']
+
+    assert all(element in columns for element in df.columns.values)
+


Check also the number of rows in the output and rename the test to something like test_create_df_kr_map_shape.

Also, add another test checking that the non-dummy values set in the function fall in the expected range.

gonzaponte · 2025-01-09T11:14:46Z

invisible_cities/reco/krmap_functions.py

+def valid_bin_counter(map_df             : pd.DataFrame,
+                      validity_parameter : Optional[float] = 0.9):


Also, need tests for this function

gonzaponte · 2025-01-09T11:17:28Z

invisible_cities/reco/krmap_functions.py

+def regularize_map(maps    : pd.DataFrame,
+                   x2range : Tuple[float, float]):


Tests. At least:

check it doesn't modify the input

create a silly map with ones and a few "invalid entries", then check the invalid entries are restored to 1.

gonzaponte

a few more comments.

gonzaponte · 2025-01-31T09:29:27Z

invisible_cities/reco/krmap_functions.py

+    counts       = counts.flatten()
+    bin_indexes -= 1
+    bin_labels   = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins),
+                                        mode='clip', order = 'F')
+
+    return counts, bin_labels


Suggested change

counts = counts.flatten()

bin_indexes -= 1

bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins),

mode='clip', order = 'F')

return counts, bin_labels

# indexes 0 and len+1 represent underflow and overflow, thus we subtract 1

bin_labels = np.ravel_multi_index(bin_indexes - 1, dims=(n_xbins, n_ybins),

mode='clip', order = 'F')

return counts.flatten(), bin_labels

if I understand correctly mode="clip" means that bins out of range will be assigned the closest bin. Is this what we used to do @bpalmeiro ?

In our case, we had three values for "fiducial":

The R max for the selection (just a quality one to ensure everything was inside)

The R max for the map, that defines the maximum extension of it. In this case, if the event didn't happen to fall inside the bins, it was disregarded.

Also (not sure here it's the most appropriate place to mention, tho), we had a posterior step where the peripheral bins (those further than a fiducial r -but within said Rmax for map production-) were set to nan.

The latter was used to ensure that, in the posterior analysis, all the hits reconstructed outside the volume were automatically flagged as nan.

That said, the best approach would be to disregard events outside boundaries instead of fakely merging them in the border bins, even if we rely on a previous selection. It's not a very strong opinion, but I think it adds redundancy (and, therefore, a bit of robustness) to the process.

Yes, my bad... I assumed that there would not be any events outside the boundaries here since I was expecting a "clean" dst in this part of the code, so when I defined bin_labels like that I didn't think of the consequences in case some event was actually out of the range.

I am thinking in a solution like this one:

Suggested change

counts = counts.flatten()

bin_indexes -= 1

bin_labels = np.ravel_multi_index(bin_indexes, dims=(n_xbins, n_ybins),

mode='clip', order = 'F')

return counts, bin_labels

counts = counts.flatten()

bin_indexes -= 1

valid_mask = ( in_range(bin_indexes[0], 0, n_xbins, right_closed=True) &

in_range(bin_indexes[1], 0, n_ybins, right_closed=True) )

bin_labels = np.full(bin_indexes.shape[1], fill_value=np.nan)

bin_labels[valid_mask] = np.ravel_multi_index((bin_indexes[0, valid_mask],

bin_indexes[1, valid_mask]),

dims=(n_xbins, n_ybins),

order='F')

In case some events are outside the desired range, their label would be a NaN instead of a numerical index, so when grouping the events bin by bin, none would be assigned to those events...

gonzaponte · 2025-01-31T12:10:47Z

invisible_cities/reco/krmap_functions.py

+    return valid_per
+
+
+def fit_and_fill_map(map_bin : pd.DataFrame,


If I understand correctly, this is a pd.Series, not a pd.DataFrame

gonzaponte · 2025-01-31T12:35:19Z

invisible_cities/reco/krmap_functions.py

+    outliers  = new_map.in_active & ~new_map.valid
+
+    if isinstance(x2range, Tuple):
+          outliers &= ~in_range(new_map.chi2, *x2range)


I think the logic is wrong here. The outliers should be those that are in the active and (not valid OR not in range). Truth table:

in_active valid x2_in_range outlier expected -------------------------------------------------- True True True False False <--- satisfied True True False False True <--- not satisfied True False True False True <--- not satisfied True False False True True <--- satisfied False x x False False <--- satisfied

gonzaponte · 2025-01-31T12:38:40Z

invisible_cities/reco/krmap_functions_test.py

+                'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y']
+
+    assert all(element in columns for element in df.columns.values)
+    assert df.bin.nunique() == n_bins_x*n_bins_y


should the length od the dataframe be also n_binx_x*n_bins_y?

gonzaponte · 2025-01-31T12:50:21Z

invisible_cities/reco/krmap_functions_test.py

+def test_valid_bin_counter_warning(n_bins, rmax, validity_parameter):
+    counts = np.array(range(n_bins[0]*n_bins[1]))
+    krmap  = krf.create_df_kr_map(bins_x = np.linspace(-rmax, +rmax, n_bins[0]+1),
+                                  bins_y = np.linspace(-rmax, +rmax, n_bins[1]+1),
+                                  counts = counts,
+                                  n_min  = 0,
+                                  r_max  = np.nextafter(np.sqrt(2)*rmax, np.inf))
+
+    krmap.valid.iloc[0 : 9] = True


as discussed offline, simplify the data in this test

gonzaponte · 2025-01-31T12:52:50Z

invisible_cities/reco/krmap_functions_test.py

+    if validity_parameter == 1:
+        with warns(UserWarning, match = "inner bins are not valid."):
+            krf.valid_bin_counter(map_df = krmap, validity_parameter = validity_parameter)


for validity_parameter = 0.2, you are not checking anything. Check this: https://stackoverflow.com/a/45671804

carhc requested review from jahernando, gonzaponte and bpalmeiro March 8, 2024 17:25

carhc assigned bpalmeiro and carhc Mar 8, 2024

bpalmeiro linked an issue Oct 18, 2024 that may be closed by this pull request

ICAROS: calibration map creator #863

Open

6 tasks

carhc force-pushed the map_components branch from cb58021 to f1c6ab6 Compare December 27, 2024 13:25

carhc marked this pull request as draft December 30, 2024 10:14

carhc force-pushed the map_components branch 2 times, most recently from 063b20c to c7606ed Compare December 30, 2024 10:34

carhc added 2 commits December 30, 2024 13:20

Add functions for map computation

5b331f8

Add tests for map computation

aedc73e

carhc force-pushed the map_components branch from c7606ed to aedc73e Compare December 30, 2024 12:25

gonzaponte reviewed Dec 30, 2024

View reviewed changes

carhc added 6 commits January 3, 2025 12:08

Update valid_bin_counter function

7593cd9

Update fit_and_fill_map function

e943e9c

Update create_df_kr_map function

bd79755

Update calculate_residuals function

a27dd34

Remove find_outliers function

f480e1e

Remove get_XY_bins function

58b2824

gonzaponte removed request for jahernando and bpalmeiro January 9, 2025 10:18

carhc added 3 commits January 9, 2025 11:24

Remove get_XY_bins tests

7928ec4

Update calculate_pval function

a8de173

Update regularize_maps function

24419d8

gonzaponte reviewed Jan 9, 2025

View reviewed changes

carhc added 2 commits January 28, 2025 18:33

Update create_df_kr_map function

cd4e170

Update get_bin_counts_and_event_bin_id function

6cb42eb

carhc added 12 commits January 28, 2025 18:36

Update fit_and_fill_map function

2ee3482

Update regularize_map function

11cb1e5

Cosmetics

ddfc3ea

Update test for get_number_on_bins

887eac6

Update test for create_df_kr_map function

c4ca129

Add test for get_bin_counts_and_event_bin_id function

46955a9

Update valid_bin_counter function

b30b7b8

Add tests for valid_bin_counter function

e6d0256

Cosmetics

bae44b8

Update valid_bin_counter function

7775792

Update regularize_map_function

0a48e25

Add test for regularize_map function

9cb9117

gonzaponte reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add main functions for the computation of Kr maps #866

Add main functions for the computation of Kr maps #866

carhc commented Mar 8, 2024

jahernando commented Apr 4, 2024 •

edited

Loading

jahernando commented May 14, 2024

gonzaponte left a comment

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte Jan 9, 2025

gonzaponte left a comment

gonzaponte Jan 31, 2025

gonzaponte Jan 31, 2025

bpalmeiro Feb 5, 2025

bpalmeiro Feb 5, 2025

carhc Feb 7, 2025

gonzaponte Jan 31, 2025

gonzaponte Jan 31, 2025

gonzaponte Jan 31, 2025

gonzaponte Jan 31, 2025

gonzaponte Jan 31, 2025

		fittype : KrFitFunction
		Chosen fit function for map computation

	Tuple containing bins in both axis
	Tuple containing bins in both axes

		geom_comb = list(itertools.product(b_center[0], b_center[1]))
		r_values = np.array([np.sqrt(x2+y2)for x, y in geom_comb])

		'pval', 'in_active', 'has_min_counts', 'fit_success', 'valid', 'R', 'X', 'Y']

		assert all(element in columns for element in df.columns.values)

		def valid_bin_counter(map_df : pd.DataFrame,
		validity_parameter : Optional[float] = 0.9):

		def regularize_map(maps : pd.DataFrame,
		x2range : Tuple[float, float]):

		return valid_per


		def fit_and_fill_map(map_bin : pd.DataFrame,

Add main functions for the computation of Kr maps #866

Are you sure you want to change the base?

Add main functions for the computation of Kr maps #866

Conversation

carhc commented Mar 8, 2024

jahernando commented Apr 4, 2024 • edited Loading

jahernando commented May 14, 2024

gonzaponte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gonzaponte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahernando commented Apr 4, 2024 •

edited

Loading