Log and Uniform Binning for Random Forest Classifier#33
Open
theabbybault wants to merge 36 commits intoLSSTDESC:masterfrom
Open
Log and Uniform Binning for Random Forest Classifier#33theabbybault wants to merge 36 commits intoLSSTDESC:masterfrom
theabbybault wants to merge 36 commits intoLSSTDESC:masterfrom
Conversation
Member
|
Thank you for your entry @theabbybault ! This is super interesting! So if I understand this right, aside from anything else, one should prefer log binning :-) |
merged all methods into one file
merged all methods into one file
merged all methods into one file
merged all methods into one file
File has options for log, random (previously called uniform) and combining bins, as well as setting a seed.
no longer needed. adding all methods to one file eliminated the need to change this file.
removes unnecessary files and update others to include new updates to code
Author
fix the error of plots not going to the right folder
new binning method is from David Kirkby. Calculates the bin edges so they are equally spaced in comoving distance (I've called it 'chi').
Author
|
This last update included a new binning method (code written by David Kirkby, here). where the bins are equally spaced in comoving distance, called 'chi'. I've also updated the notebook to include some plots for this method. Focusing on the FOM_DETF_3x2 score, a table of scores for each method is shown below: A summary of what I've done:
All plots posted here can be found in the notebook funbins_results.ipynb. The plots showing the bins can be found in results under the selected method (and then under jax). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.





I looked at how different binnings would affect the scores of the random forest classifier. I mainly focused on log binning and uniform binning. For the log binning I created evenly spaced numbers on a log scale, and then put galaxies into the bins based on their percentile. For the uniform binning I created the bins based on a uniform random distribution and sorted the galaxies based on that. The binning for each log and uniform for 10 bins are shown below.


log:
uniform:
I also attempted to combine some of the bins for each log and uniform binning. I started with 5 bins, and combined 2 bins so that there were only 4 bins. I was only able to combine bin 0 with bins 1, 2, and 3 for each. The combined bins were renamed so the plot legends might be a bit confusing (if they weren’t renamed the calculations threw an error). An example of this combined bin binning for uniform binning is:

The notebook showing the results is called 'log_uniform_and_combined_bins.ipynb' is in the main part of the repository. Below is an example of a plot showing the scores for each metric calculated using Jax and log binning

There are more plots similar to this in the notebook.