Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent using cache_subdirectory=None on JobRunnerWithCache's children #1436

Open
AndreaFrancis opened this issue Jun 27, 2023 · 3 comments
Open

Comments

@AndreaFrancis
Copy link
Contributor

Currently, all job runners that depend on JobRunnerWithCache and use the cache_subdirectory field, need to do validation before using the generated value like here https://github.com/huggingface/datasets-server/pull/1296/files#diff-d9a2c828d7feca3b7f9e332e040ef861e842a16d18276b356461d2aa34396a8aR248

We need a better way to prevent using None values given that JobRunnerWithCache has https://github.com/huggingface/datasets-server/blob/main/services/worker/src/worker/job_runners/_job_runner_with_cache.py#L23 cache_subdirectory: Optional[Path].
If the pre_compute method is not run, job runner children could fail because of non-existing directory.
@severo suggested adding a test on each job runner implementation of JobRunnerWithCache to ensure the validation is done.

@severo
Copy link
Collaborator

severo commented Jun 27, 2023

@severo suggested adding a test on each job runner implementation of JobRunnerWithCache to ensure the validation is done.

even better: ideally the compute method would have access to a Path, not an Optional[Path]

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@severo
Copy link
Collaborator

severo commented Aug 7, 2023

P2 label should have prevented the stale bot from closing the issue. Fixed with #1635.

@severo severo reopened this Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants