Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

break(datasets) Rename resplitter parameter and type to preprocessor #3476

Merged
merged 11 commits into from
May 29, 2024

Conversation

adam-narozniak
Copy link
Contributor

@adam-narozniak adam-narozniak commented May 20, 2024

Issue

What is currently named as Resplitter= Callable[[DatasetDict], DatasetDict] and in FederatedDataset(..., resplittter: Union[Resplitter, ...]` is not the most accurate description what can happen.

Description

The resplitter can currently serve more as a preprocessor (it seems like a broader term to better capture the essence of the object).
After the dataset is downloaded, more operations that resplitting can happen. I'll list a few of them:

  • removal of some data (not meeting certain criteria)
  • transformation e.g., different date format
  • creation of new features
    Currently, even though such operations can happen, they are captured as Resplitter (a callable that accomplishes the goal).

Proposal

Rename the resplitter parameter of FederatedDataset and Resplitter the callable to preprocessor and Preprocessor to better capture the essence of what can happen.
Rename the resplitter directory to preprocessor.

Also simplify the following names:

  • MergeResplitter to Merger
  • DivideResplitter to Divider

@adam-narozniak adam-narozniak marked this pull request as ready for review May 20, 2024 13:09
Copy link
Contributor

@jafermarq jafermarq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam-narozniak looks good to me, some comments below. I wonder if it would make sense to rename the resplitter directory to preprocessor.

@@ -25,7 +25,7 @@


class TestResplitter(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename to TestPreprocessor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for that I should divide the file so there would be one for merge_resplitter and one for preprocessor in general. I'll do after we confirm the directory renaming.

@adam-narozniak
Copy link
Contributor Author

@adam-narozniak looks good to me, some comments below. I wonder if it would make sense to rename the resplitter directory to preprocessor.

We can do that, and I can move the preprocessor definition there, too. Maybe in the future, we will need to have more subdirs there, e.g., if we had more types of preprocessors.

@adam-narozniak
Copy link
Contributor Author

Based on the discussion off the github I did the following:

  • rename the resplitter dir to preprocessor
  • moved back the preprocessor.py (renamed from resplitter.py)
  • rename Resplitter to Preprocessor
  • simplify the naming of *Resplitter to drop it from the names (so we have now Merger and Divider)
    (the description of the PR is updated accordingly)

@jafermarq jafermarq enabled auto-merge (squash) May 29, 2024 10:44
@jafermarq jafermarq merged commit a443f86 into main May 29, 2024
35 checks passed
@jafermarq jafermarq deleted the fds-rename-resplitter branch May 29, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants