Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Output format of conformal predictions #221

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

baggiponte
Copy link
Collaborator

@baggiponte baggiponte commented Jun 6, 2024

@FBruzzesi I am starting to work on #135 ; would like your opinion on the work so far 😊 Also I would like to hear your opinion on how you would test the conformal prediction code.

(To keep track: this is preparatory work for #39)

Copy link

vercel bot commented Jun 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
functime-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 8, 2024 1:56pm

@baggiponte baggiponte self-assigned this Jun 6, 2024
@baggiponte baggiponte added refactor Code change that neither fixes a bug nor adds a feature forecasting Forecasters and adapters labels Jun 6, 2024
Copy link
Contributor

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like your opinion on the work so far

It looks good in general. My understanding from our private conversation was that you wanted to change the format from long to wide as well.

I left a couple of consideration in the codebase itself though, and I believe that hinting may require some more attention if you care about those.

Also I would like to hear your opinion on how you would test the conformal prediction code.

As behavioral testing I would like to see that the point prediction in regression is always between the lower and upper bounds, which it is not the case with the current implementation.

From the theoretical point of view, maybe one could also check that the actual value falls between the bounds (at least) x-percent of the times - this would be a flaky test though :)


# Make alpha base 100
y_pred_quantiles = y_pred_quantiles.with_columns(
(pl.col("quantile") * 100).cast(pl.Int16)
Copy link
Contributor

@FBruzzesi FBruzzesi Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have more than 2 decimal places, then the casting would just truncate the decimal values 🤔

Example:

import polars as pl

pl.DataFrame({"a": [0.111, 0.11]}).with_columns((pl.col("a")*100).cast(pl.Int16))
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i16 │
╞═════╡
│ 11  │
│ 11  │
└─────┘

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct; I am still to change that bit of the code fortunately. This column won't exist once I'm done ✔️

return (0.1, 0.9)
elif len(alphas) != 2:
raise ValueError("alphas must be a list of length 2")
elif not all(0 < alpha < 1 for alpha in alphas):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check that the sequence is sorted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the function returns the alphas sorted, so that should be OK. Am I correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forecasting Forecasters and adapters refactor Code change that neither fixes a bug nor adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants