-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
break(datasets) Rename resplitter parameter and type to preprocessor #3476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam-narozniak looks good to me, some comments below. I wonder if it would make sense to rename the resplitter
directory to preprocessor
.
@@ -25,7 +25,7 @@ | |||
|
|||
|
|||
class TestResplitter(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename to TestPreprocessor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for that I should divide the file so there would be one for merge_resplitter and one for preprocessor in general. I'll do after we confirm the directory renaming.
We can do that, and I can move the preprocessor definition there, too. Maybe in the future, we will need to have more subdirs there, e.g., if we had more types of preprocessors. |
Based on the discussion off the github I did the following:
|
Issue
What is currently named as
Resplitter= Callable[[DatasetDict], DatasetDict]
and in FederatedDataset(..., resplittter: Union[Resplitter, ...]` is not the most accurate description what can happen.Description
The
resplitter
can currently serve more as apreprocessor
(it seems like a broader term to better capture the essence of the object).After the dataset is downloaded, more operations that resplitting can happen. I'll list a few of them:
Currently, even though such operations can happen, they are captured as
Resplitter
(a callable that accomplishes the goal).Proposal
Rename the
resplitter
parameter of FederatedDataset andResplitter
the callable topreprocessor
andPreprocessor
to better capture the essence of what can happen.Rename the
resplitter
directory topreprocessor
.Also simplify the following names:
MergeResplitter
toMerger
DivideResplitter
toDivider