Conversation
Not yet modified from the random forest version, but coming
Now actually is based on GPz
Two python2.7 scripts required
|
Now updated to a basic version of using GPz for classification. Needs classifier_train_GPz.py and classifier_predict_GPz.py. Have some further tweaks that will try to include if time... |
|
Hi @pwhatfield, thanks for your entry! Feel free to document your approach and some preliminary results you may have in this thread, to help people get a sense of the landscape of the competition :-) |
Few aesthetic changes
Few aesthetic changes
Few aesthetic changes
| In the process of generating something more comprehensive, but there is just a very tiny snapshot of some preliminary results. The method is based on GPz machine learned photo-z, and then just a simple binning scheme that tries to have the same number of galaxies in each bin, it doesn't exploit the fact that you don't have to base colour space divisions on redshift or that you can have more interesting redshift bins. There is an option to selectively not use galaxies based on uncertainty e.g. if galaxies in the test set are closer than 1sigma to a boundary etc. More interestingly there is an option to selectively remove galaxies for which there is extrapolation - there is a parameter that describes how much the ML is having to extrapolate and results with too much extrapolation should not be used, so you can exclude low uncertainty but high extrapolation galaxies. This probably doesn't make a huge amount of difference for this case where the test and training data are drawn from the same distribution but might be useful in a scenario where the testing data was a lot different. |
|
@pwhatfield - could you give me some idea of the timing you found when creating this method? I'm running it and it looks like it will take quite a long time (each iteration of the minimizer is taking 10-15 minutes). No problem if that's expected, I just wanted to check I didn't mess anything up in the merge. |
|
Although it seems to be speeding up with each subsequent iteration. |
|
Hmm, what bands and size of training set? And this was for one iteration, not the whole training? I don't think it was ever taking 15min per step for me, maybe up to 2min for a large training set and many bands. The number of basis functions, I think I had it as m = 100, could probably be safely taken down to m=50 if that makes a big difference... |

A tomographic bin classifier based on the GPz photometric code