-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide one "partial" field per entry in aggregated responses #1532
Comments
Note that this means changing the format (and implementation) of the config-parquet-and-info step, and recomputing all its artifacts 😬 Also: the field |
Maybe https://github.com/huggingface/moon-landing/pull/7079 (internal) is sufficient for now, ie: show a general warning for the dataset if some of its splits is partial. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I think we should store which splits are partial and which are complete. Opening an issue for that -> #2809, and this one will depend on it. |
Note that we can get this info per split already for free for most datasets:
So actually we should be able to retrieve most of the |
yes, it would be a good way to migrate the cache entries to the new schema instead of recomputing in #2809 |
For example, https://datasets-server.huggingface.co/size?dataset=c4 only provides a global
partial: true
field and the response does not explicit that the "train" split is partial, while the "test" one is complete.Every entry in
configs
andsplits
should also include its ownpartial
field, to be able to show this information in the viewer (selects)Endpoints where we want these extra fields:
The text was updated successfully, but these errors were encountered: