Skip to content

I-ALiRT: Create SWAPI count rate optimization function#1705

Merged
torimarbois merged 12 commits intoIMAP-Science-Operations-Center:devfrom
torimarbois:swapi_count_rate
Jun 18, 2025
Merged

I-ALiRT: Create SWAPI count rate optimization function#1705
torimarbois merged 12 commits intoIMAP-Science-Operations-Center:devfrom
torimarbois:swapi_count_rate

Conversation

@torimarbois
Copy link
Copy Markdown
Contributor

Change Summary

Overview

Add the main piece of SWAPI I-ALiRT processing: a beefy optimization function for coincidence count rates. You can compare the code to the science document located here, if desired.

New Files

  • tests/ialirt/test_data/ialirt_test_data.csv
    • This includes energy passbands/steps and count rates that the SWAPI team used to verify their model, and theoretically these energy steps won't change, so I'm using it as a sort of look up table.

Updated Files

  • ialirt/l0/process_swapi.py
    • Adds the new optimization function and organizes its call & outputs within the overall processing function.
  • tests/ialirt/unit/test_process_swapi.py
    • Test that the optimization function does not return None.

Testing

I added unit tests to verify results aren't None, but verifying the output values against real data felt too tricky to try and implement (mainly because I don't have an exact set of data that produces an exact set of optimized parameters).

However, I did graph my results and compared the results of the optimized parameters my model found to the results with the parameters that the science model found when using the six highest count rate values (which is what the science team did):
image
The blue line is the model using the optimized parameters from the science document, the orange line is the model using the optimized parameters that this code found, and the blue points are the provided data points. As you can see, they are very close!

Copy link
Copy Markdown
Contributor

@laspsandoval laspsandoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are about documentation since it is difficult for me to follow where the equations are coming from; not having a SWAPI background. Nice job. Thanks for doing this.

)
energy_passbands = energy_data["Energy [eV/q]"].to_numpy()

def count_rate(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer keeping functions separately. It's simpler for unit testing and reuse is less complicated.

"""Placeholder test for the process_swapi_ialirt function."""
"""Test that the process_swapi_ialirt function returns expected keys."""

swapi_result = process_swapi_ialirt(xarray_data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the description of the test, add some more keys to test.

count_rates = energy_data["Count Rates [Hz]"].to_numpy()

result = optimize_pseudo_parameters(count_rates)
assert result is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a result to compare to? If not could you create some input data and then have expected output data?

process_swapi_ialirt,
)
from imap_processing.utils import packet_file_to_datasets

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make certain to add a test for the count_rates function.

Particle coincidence count rate.
"""
# Scientific constants used in optimization model
boltz = 1.380649e-23 # Boltzmann constant, J/K
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is enough constants that I think you could bring it out to a separate dataclass like:
/Users/lasa6858/imap_processing/imap_processing/ultra/constants.py

That way they are stored in a single place and we can see the algorithm more closely.

speed = speed * 1000 # convert km/s to m/s
density = density * 1e6 # convert 1/cm**3 -to 1/m**3

return (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the notes section could you put where these equations are from?

)
)

initial_param_guess = np.array([550, 5.27, 1e5])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the guess here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it should go before the function call. Do you think it should be elsewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Laura suggested adding a comment explaining where this guess was pulled from in the algorithm document.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial guess for the pseudo-speed can be obtained from the energy corresponding to the maximum/peak count rate (energy_peak_rate), i.e.,
speed_guess = sqrt(2 * energy_peak_rate * 1.60218e-19 / proton_mass)/1000 km/s.
It is not straightforward to come up with a good initial guess for the pseudo-density and temperature. Some nominal values, like the following, should be okay.
dens_guess = 5 cm^-3, and
T_guess = 1e5 K.

energy_data = pd.read_csv(
f"{imap_module_directory}/tests/ialirt/test_data/ialirt_test_data.csv"
)
count_rates = energy_data["Count Rates [Hz]"].to_numpy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will count_rates be a lookup table that is used in operations? Or is it just test data? If it's a lookup table we should put it in the ialirt (not test) directory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bishwassth are the energy steps in the spreadsheet you sent me (in ialirt_test_data.csv) going to be the ones used for real-time processing? Should I use them as a lookup table for this parameter optimization?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 63 energy steps are very close to what will be used for real-time processing, but may not be exactly similar. The energies for real-time processing will be from the ESA unit conversion ADP and will be in the form of a lookup table (e.g., Table 3 in the algorithms document). BTW, could you confirm if the I-ALiRT code is using "ESA unit conversion ADP" or not? If not, we have to supply them in a different lookup table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the file called imap_swapi_esa-unit-conversion_20250211_v000.csv in our lookup table folder for SWAPI. Is this what you mean? And are the energy passbands listed in this file? None of the columns are named simiilar to that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the correct file to use. You have to read the columns for "Energy" in this file from ESA step # 0-62. Note that the energy values for the first 34 ESA step # (0-33) are fixed to "1,163", which will be updated to realistic values in space.

@laspsandoval
Copy link
Copy Markdown
Contributor

Another question: how close should the orange and blue lines be from each other? Did the instrument team give any idea of criteria for success. Could you plot the difference of the two as well?

"pseudo_temperature": [],
}

if count_rates.ndim > 1:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting both cases of array and float inputs? Can you just do one or the other and fix what the input is here.

Copy link
Copy Markdown
Collaborator

@bishwassth bishwassth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment is on the implementation of the non-linear fitting.

solution_dict["pseudo_density"].append(sol[0][1])
solution_dict["pseudo_temperature"].append(sol[0][2])
else:
sol = curve_fit(count_rate, energy_passbands, count_rates, initial_param_guess)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing a weighted fit (inverse variance weighted). The "Count Rates Error" in the CSV file (3rd column) can be used as weights. Also, suggest checking the reduced chi-square value for the goodness of the fit.

In addition, are you using only 6 energy bins around the maximum count rates (1 where max count rate is + 3 on the left + 2 on the right) for this fitting?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a formula to calculate the count rates error in real-time processing, so unless that gets created and I can implement it in the code, it doesn't make sense to me to add them as weights here when I only have one instance of them in the test data you sent me. Unless the count rates error remains the same for every passband every time?

I can work on getting the chi-square value from curve_fit, but it's surprisingly not a straightforward calculation.

Yes, the fit you see in the plots shown above are including only those 6 energy steps.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that the SWAPI science processing code up to the level 2 was used for the SWAPI I-ALiRT, just for coarse scan, though (not the fine scans), and it creates arrays of energy (which is always fixed for coarse scans), count rates, and count rate errors for each 12 s sweep. If it is only using the part of the code that creates arrays of energy and count rates, I can provide a conversion factor from count rates to count rates error. Of course, the count rate error doesn't remain the same over time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mistaken, there is code available for me to calculate the count rate error. Right now, the SWAPI science processing code calculates that as the sqrt() of the count rate. Is that the same conversion factor I should use for this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error in the count rate is calculated as: sqrt(counts)/TIME_PER_BIN, where TIME_PER_BIN = 12/72 s and counts are produced from L1 science processing.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@torimarbois In this example, the pseudo speed obtained from fitting looks very close to what I reported in the algorithms document. I am not sure which test data you are referring to. The subset of data (6 energy bins around the proton peak) I provided in the previous comment is extracted from the test data: "ialirt_test_data.csv". Note that the zero count rates in the test data and some non-zero count rates don't belong to the solar wind protons, and we need to exclude them during the fitting. Therefore, we choose only 6 energy bins around the peak count rates for the fitting. I hope this makes sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bishwassth should the fitting for each sweep always only be the 6 energy bins around the peak count rate?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bishwassth sorry to rehash this one more time, but I want to make sure we're on the same page - you're saying that for every fitting, even during real-time processing (when we're NOT using test data), only the 6 points around the peak should be included in the model input. Right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@torimarbois No worries. Yes, only 6 energy bins around the peak count rate should be included in the fitting. However, I would note that the position of the peak count rate in the ESA energy bins changes over time during real-time processing because of the variability in the solar wind conditions. So we have to find the position of the peak count rate in SWAPI's coarse energy bins in order to select the 6 energy bins used for fitting.

boltz = 1.380649e-23 # Boltzmann constant, J/K
at_mass = 1.6605390666e-27 # atomic mass, kg
prot_mass = 1.007276466621 * at_mass # mass of proton, kg
eff_area = 3.3e-5 * 1e-4 # effective area, meters squared
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we edit this code directly later to update the value of the effective area?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@torimarbois
Copy link
Copy Markdown
Contributor Author

Another question: how close should the orange and blue lines be from each other? Did the instrument team give any idea of criteria for success. Could you plot the difference of the two as well?

@laspsandoval This is the percent difference between the expected count rates & the actual count rates calculated with the optimized parameters for the 6 points surrounding the maximum count rate. @bishwassth is this satisfactory?

image

Copy link
Copy Markdown
Contributor

@laspsandoval laspsandoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@torimarbois torimarbois merged commit f9edeaa into IMAP-Science-Operations-Center:dev Jun 18, 2025
12 of 14 checks passed
@bourque bourque added this to IMAP Jun 30, 2025
@bourque bourque added this to the June 2025 milestone Jun 30, 2025
@bourque bourque moved this to Done in IMAP Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants