- adapation to different distribution of angles
- hysteresis in adaptation to wide vs narrow distributions
- continual learning on different distributions
- Compare the results of different trainers, with and without saving the trainer state.
- Saving the trainer state prevents initialization pollution, while saving only the network weights does not!
- Experiment with alternative metrics, for example calibration curves
I'm at a point where there are several things that I need to accomplish to push to the next phase of the project
- termination conditions for the iteration: based on the variance of the distirbutions, or maybe the KL divergence or something like that
- Add mean reversion to the iteration to prevent the growth of noise between iterates. Ideally, we want this to act on a small scale only.