-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Loss saved in hall of fame is not identical to loss recalculated based on prediction #816
Comments
@BrotherHa is it okay if you paste the code inline in the markdown block? This is so that search indexes like Google can correctly find this issue if other people have a similar problem. |
Hello @MilesCranmer |
I have searched a bit where the issue occurs and in fact for most individuals of the regression the loss values from the equations_-table and the recalculated ones are absolutely identical until very high precision, but for some rare cases they divert significantly as in the example. |
Is it only when reloading from a pickle file? |
And are you sure you are computing the loss in a way completely identical between the Julia and the Python side of things? |
About the loss functions I am very sure. The only issues here might be problems caused by the precision of calculation, etc. And for more than 90% of all individuals both methods calculate exactly the same loss. |
Just to confirm, it is only when you load the model from a saved checkpoint that this happens? It’s not when you use |
Do you see any pattern in the ones that don’t match? e.g., Are they very complex expressions? |
I do not see any pattern in the equations, that have the error. They are not the most complex ones neither do they have explicitly complex operators. |
It happens also when using fit for the first time. I will try to reload and refit, when I find time. |
But it's only when you have a model loaded from disk, right? Like if you did: model = PySRRegressor(warm_start=True) # fresh model; NOT loaded from disk
model.fit(X, y)
#= ... =#
model.fit(X, y) |
The Problem occured in the first place when fitting a new model. Then directly after the regression finished I called predict() and calculated the loss for training, test and total dataset based on that. But before that I saved the output of model.get_best(). The value from get_best() and the manual calculation for the best individual shows the bahavior in a less significant way (0.014545 instead of 0.014536), but it is still not in the range most other values are equal. For the example given in my first messege here I called the predict() method also just after the initial run and calculated the loss as 0.022473989882165576, but I did not save the loss from model.equations_ separately. But in the hall of fame csv-file loss is also saved as 0.01720243, like it is outputed when I call model.equations_ from the loaded model. |
@BrotherHa would it be possible for you to make a MWE of this issue so I can try to reproduce it? No worries if not. It's a bit tricky because the unittests already check for warm starts from files and they seem to pass. So I'm puzzled as to what conditions trigger this. |
Hello,
I have identified an issue with some PySR regressions, that I ran. The loss saved in the hall of fame is not identical to the loss calculated with the same dataset and same custom loss function based on the PySR prediction. I have documented the issue in the notebook file attached.
Have similar problems occurred before or do you have any idea, what might be the reason for this?
As an additional test, i was thinking about trying to recalculate the loss of the individual inside PySR, the be able to reproduce the potentially faulty loss-calculation. Is there a way to do so?
Best regards and thanks in advance!
Jupyter Notebook Code:
-> Loss recalculated and loss calculated in PySR are not identical!
Version
1.3.1
Operating System
Windows
Package Manager
Conda
Interface
Other (specify below)
Relevant log output
Extra Info
loss_difference_pysr.pdf
The text was updated successfully, but these errors were encountered: