fix: restrict auto-format to datetime-like indices to prevent data co…#133
Open
Saloni-0465 wants to merge 1 commit into
Open
fix: restrict auto-format to datetime-like indices to prevent data co…#133Saloni-0465 wants to merge 1 commit into
Saloni-0465 wants to merge 1 commit into
Conversation
b20891d to
c8c9ffd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
N/A
What does this implement/fix? Explain your changes.
Before: When formatting a data handle, the code tried to guess a calendar frequency and fill gaps using pd.date_range and reindex for every kind of index. That is only safe for real date/time style indexes. For simple row indexes (like RangeIndex or plain integers), it could change how many rows you have, add missing values, or otherwise mess up the series while still reporting success.
After: That calendar step only runs for proper date indexes (DatetimeIndex) and period indexes (PeriodIndex). For anything else, we skip that step and record why in changes_made (calendar_gap_fill_skipped and calendar_gap_fill_reason).
We also catch errors from pd.infer_freq when pandas refuses to infer (for example, very few dates).
We fixed a crash when saving metadata: some indexes don’t have a .freq attribute, so we use getattr(y.index, "freq", None) instead of assuming it exists.
Tests: tests/test_format_data_handle.py.
Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
Any other comments?
To verify locally:
python -m pytest tests/test_format_data_handle.py -vPR checklist
For all contributions
For new estimators