-
Notifications
You must be signed in to change notification settings - Fork 19
I-ALiRT: Create SWAPI count rate optimization function #1705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
I-ALiRT: Create SWAPI count rate optimization function #1705
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are about documentation since it is difficult for me to follow where the equations are coming from; not having a SWAPI background. Nice job. Thanks for doing this.
) | ||
energy_passbands = energy_data["Energy [eV/q]"].to_numpy() | ||
|
||
def count_rate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer keeping functions separately. It's simpler for unit testing and reuse is less complicated.
@@ -87,7 +90,20 @@ def test_decom_packets(xarray_data, swapi_test_data): | |||
|
|||
|
|||
def test_process_swapi_ialirt(xarray_data): | |||
"""Placeholder test for the process_swapi_ialirt function.""" | |||
"""Test that the process_swapi_ialirt function returns expected keys.""" | |||
|
|||
swapi_result = process_swapi_ialirt(xarray_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description of the test, add some more keys to test.
count_rates = energy_data["Count Rates [Hz]"].to_numpy() | ||
|
||
result = optimize_pseudo_parameters(count_rates) | ||
assert result is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a result to compare to? If not could you create some input data and then have expected output data?
from imap_processing.ialirt.l0.process_swapi import ( | ||
optimize_pseudo_parameters, | ||
process_swapi_ialirt, | ||
) | ||
from imap_processing.utils import packet_file_to_datasets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make certain to add a test for the count_rates function.
Particle coincidence count rate. | ||
""" | ||
# Scientific constants used in optimization model | ||
boltz = 1.380649e-23 # Boltzmann constant, J/K |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enough constants that I think you could bring it out to a separate dataclass like:
/Users/lasa6858/imap_processing/imap_processing/ultra/constants.py
That way they are stored in a single place and we can see the algorithm more closely.
speed = speed * 1000 # convert km/s to m/s | ||
density = density * 1e6 # convert 1/cm**3 -to 1/m**3 | ||
|
||
return ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the notes section could you put where these equations are from?
) | ||
) | ||
|
||
initial_param_guess = np.array([550, 5.27, 1e5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this the guess here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it should go before the function call. Do you think it should be elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Laura suggested adding a comment explaining where this guess was pulled from in the algorithm document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial guess for the pseudo-speed can be obtained from the energy corresponding to the maximum/peak count rate (energy_peak_rate), i.e.,
speed_guess = sqrt(2 * energy_peak_rate * 1.60218e-19 / proton_mass)/1000 km/s.
It is not straightforward to come up with a good initial guess for the pseudo-density and temperature. Some nominal values, like the following, should be okay.
dens_guess = 5 cm^-3, and
T_guess = 1e5 K.
energy_data = pd.read_csv( | ||
f"{imap_module_directory}/tests/ialirt/test_data/ialirt_test_data.csv" | ||
) | ||
count_rates = energy_data["Count Rates [Hz]"].to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will count_rates be a lookup table that is used in operations? Or is it just test data? If it's a lookup table we should put it in the ialirt (not test) directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bishwassth are the energy steps in the spreadsheet you sent me (in ialirt_test_data.csv
) going to be the ones used for real-time processing? Should I use them as a lookup table for this parameter optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first 63 energy steps are very close to what will be used for real-time processing, but may not be exactly similar. The energies for real-time processing will be from the ESA unit conversion ADP and will be in the form of a lookup table (e.g., Table 3 in the algorithms document). BTW, could you confirm if the I-ALiRT code is using "ESA unit conversion ADP" or not? If not, we have to supply them in a different lookup table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the file called imap_swapi_esa-unit-conversion_20250211_v000.csv
in our lookup table folder for SWAPI. Is this what you mean? And are the energy passbands listed in this file? None of the columns are named simiilar to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the correct file to use. You have to read the columns for "Energy" in this file from ESA step # 0-62. Note that the energy values for the first 34 ESA step # (0-33) are fixed to "1,163", which will be updated to realistic values in space.
Another question: how close should the orange and blue lines be from each other? Did the instrument team give any idea of criteria for success. Could you plot the difference of the two as well? |
"pseudo_temperature": [], | ||
} | ||
|
||
if count_rates.ndim > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting both cases of array and float inputs? Can you just do one or the other and fix what the input is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment is on the implementation of the non-linear fitting.
solution_dict["pseudo_density"].append(sol[0][1]) | ||
solution_dict["pseudo_temperature"].append(sol[0][2]) | ||
else: | ||
sol = curve_fit(count_rate, energy_passbands, count_rates, initial_param_guess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest doing a weighted fit (inverse variance weighted). The "Count Rates Error" in the CSV file (3rd column) can be used as weights. Also, suggest checking the reduced chi-square value for the goodness of the fit.
In addition, are you using only 6 energy bins around the maximum count rates (1 where max count rate is + 3 on the left + 2 on the right) for this fitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a formula to calculate the count rates error in real-time processing, so unless that gets created and I can implement it in the code, it doesn't make sense to me to add them as weights here when I only have one instance of them in the test data you sent me. Unless the count rates error remains the same for every passband every time?
I can work on getting the chi-square value from curve_fit, but it's surprisingly not a straightforward calculation.
Yes, the fit you see in the plots shown above are including only those 6 energy steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that the SWAPI science processing code up to the level 2 was used for the SWAPI I-ALiRT, just for coarse scan, though (not the fine scans), and it creates arrays of energy (which is always fixed for coarse scans), count rates, and count rate errors for each 12 s sweep. If it is only using the part of the code that creates arrays of energy and count rates, I can provide a conversion factor from count rates to count rates error. Of course, the count rate error doesn't remain the same over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mistaken, there is code available for me to calculate the count rate error. Right now, the SWAPI science processing code calculates that as the sqrt()
of the count rate. Is that the same conversion factor I should use for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error in the count rate is calculated as: sqrt(counts)/TIME_PER_BIN, where TIME_PER_BIN = 12/72 s and counts are produced from L1 science processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bishwassth can you look into the scipy curve_fit
function and tell me how you'd like me to incorporate the count rate errors as weights? I tried to pass them in as the sigma
parameter, but when I did that, the function errored by reaching the max recursive function call number without finding the optimal parameters.
I'm not sure if sigma
is the right parameter to assign the errors to, or if it's some other issue.
Ex:
I did sol = curve_fit( f=count_rate, xdata=energy_passbands, ydata=current_sweep_count_rates, p0=initial_param_guess, sigma=current_sweep_count_rate_errors, )
Where the count rate value is 4167.6646
and the error value is 157.97491
.
And got this error: RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not much familiar with the Python libraries, but according to the scipy curve_fit documentation, sigma=count_rate_error seems the correct way to use count rate error as weight. I found this page, where an example of this type of implementation is explained.
Could you pass the following array values to curve_fit with the same initial parameter guess you are using, and let me know the output (sol)?
xdata = [1257.56, 1369.26, 1489.38, 1621.08, 1764.61, 1921.41]
ydata = [245.95, 671.87, 1253.75, 1373.73, 869.83, 257.95]
sigma = [38.34, 63.37, 86.57, 90.62, 72.11, 39.27]
boltz = 1.380649e-23 # Boltzmann constant, J/K | ||
at_mass = 1.6605390666e-27 # atomic mass, kg | ||
prot_mass = 1.007276466621 * at_mass # mass of proton, kg | ||
eff_area = 3.3e-5 * 1e-4 # effective area, meters squared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we edit this code directly later to update the value of the effective area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@laspsandoval This is the percent difference between the expected count rates & the actual count rates calculated with the optimized parameters for the 6 points surrounding the maximum count rate. @bishwassth is this satisfactory? |
6939480
to
cc0cac6
Compare
Change Summary
Overview
Add the main piece of SWAPI I-ALiRT processing: a beefy optimization function for coincidence count rates. You can compare the code to the science document located here, if desired.
New Files
Updated Files
None
.Testing
I added unit tests to verify results aren't None, but verifying the output values against real data felt too tricky to try and implement (mainly because I don't have an exact set of data that produces an exact set of optimized parameters).
However, I did graph my results and compared the results of the optimized parameters my model found to the results with the parameters that the science model found when using the six highest count rate values (which is what the science team did):

The blue line is the model using the optimized parameters from the science document, the orange line is the model using the optimized parameters that this code found, and the blue points are the provided data points. As you can see, they are very close!