Skip to content

fix: restrict auto-format to datetime-like indices to prevent data co…#133

Open
Saloni-0465 wants to merge 1 commit into
sktime:mainfrom
Saloni-0465:fix/auto-format-non-datetime-index
Open

fix: restrict auto-format to datetime-like indices to prevent data co…#133
Saloni-0465 wants to merge 1 commit into
sktime:mainfrom
Saloni-0465:fix/auto-format-non-datetime-index

Conversation

@Saloni-0465
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

N/A


What does this implement/fix? Explain your changes.

Before: When formatting a data handle, the code tried to guess a calendar frequency and fill gaps using pd.date_range and reindex for every kind of index. That is only safe for real date/time style indexes. For simple row indexes (like RangeIndex or plain integers), it could change how many rows you have, add missing values, or otherwise mess up the series while still reporting success.

After: That calendar step only runs for proper date indexes (DatetimeIndex) and period indexes (PeriodIndex). For anything else, we skip that step and record why in changes_made (calendar_gap_fill_skipped and calendar_gap_fill_reason).

We also catch errors from pd.infer_freq when pandas refuses to infer (for example, very few dates).

We fixed a crash when saving metadata: some indexes don’t have a .freq attribute, so we use getattr(y.index, "freq", None) instead of assuming it exists.

Tests: tests/test_format_data_handle.py.


Does your contribution introduce a new dependency? If yes, which one?

No.


What should a reviewer concentrate their feedback on?

  • Correctness of the DatetimeIndex path vs previous behavior (gap-fill and changes_made)
  • Whether PeriodIndex handling is acceptable or should be narrowed further
  • Whether calendar_gap_fill_* fields in changes_made are the right contract for callers

Any other comments?

To verify locally:
python -m pytest tests/test_format_data_handle.py -v


PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@Saloni-0465 Saloni-0465 force-pushed the fix/auto-format-non-datetime-index branch from b20891d to c8c9ffd Compare May 8, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant