Skip to content

GPz classifier#31

Open
pwhatfield wants to merge 7 commits intoLSSTDESC:masterfrom
pwhatfield:master
Open

GPz classifier#31
pwhatfield wants to merge 7 commits intoLSSTDESC:masterfrom
pwhatfield:master

Conversation

@pwhatfield
Copy link
Copy Markdown

@pwhatfield pwhatfield commented Aug 27, 2020

A tomographic bin classifier based on the GPz photometric code

Not yet modified from the random forest version, but coming
Now actually is based on GPz
Two python2.7 scripts required
@pwhatfield
Copy link
Copy Markdown
Author

Now updated to a basic version of using GPz for classification. Needs classifier_train_GPz.py and classifier_predict_GPz.py. Have some further tweaks that will try to include if time...

@EiffL EiffL added the entry Challenge entry label Aug 31, 2020
@EiffL
Copy link
Copy Markdown
Member

EiffL commented Aug 31, 2020

Hi @pwhatfield, thanks for your entry! Feel free to document your approach and some preliminary results you may have in this thread, to help people get a sense of the landscape of the competition :-)

Few aesthetic changes
Few aesthetic changes
Few aesthetic changes
@pwhatfield
Copy link
Copy Markdown
Author

Screen Shot 2020-09-04 at 18 43 56

In the process of generating something more comprehensive, but there is just a very tiny snapshot of some preliminary results. The method is based on GPz machine learned photo-z, and then just a simple binning scheme that tries to have the same number of galaxies in each bin, it doesn't exploit the fact that you don't have to base colour space divisions on redshift or that you can have more interesting redshift bins. There is an option to selectively not use galaxies based on uncertainty e.g. if galaxies in the test set are closer than 1sigma to a boundary etc. More interestingly there is an option to selectively remove galaxies for which there is extrapolation - there is a parameter that describes how much the ML is having to extrapolate and results with too much extrapolation should not be used, so you can exclude low uncertainty but high extrapolation galaxies. This probably doesn't make a huge amount of difference for this case where the test and training data are drawn from the same distribution but might be useful in a scenario where the testing data was a lot different.

@joezuntz
Copy link
Copy Markdown
Collaborator

joezuntz commented Oct 9, 2020

@pwhatfield - could you give me some idea of the timing you found when creating this method? I'm running it and it looks like it will take quite a long time (each iteration of the minimizer is taking 10-15 minutes). No problem if that's expected, I just wanted to check I didn't mess anything up in the merge.

@joezuntz
Copy link
Copy Markdown
Collaborator

joezuntz commented Oct 9, 2020

Although it seems to be speeding up with each subsequent iteration.

@pwhatfield
Copy link
Copy Markdown
Author

Hmm, what bands and size of training set? And this was for one iteration, not the whole training? I don't think it was ever taking 15min per step for me, maybe up to 2min for a large training set and many bands. The number of basis functions, I think I had it as m = 100, could probably be safely taken down to m=50 if that makes a big difference...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

entry Challenge entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants