You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* It is now possible to manually download the data for all datasets (if the automated download fail for any reason). See [doc](https://www.tensorflow.org/datasets/overview#load_a_dataset).
* Simplification of the dataset creation API.
* We've made it is easier to create datasets outside TFDS repository (see our updated [dataset creation guide](https://www.tensorflow.org/datasets/add_dataset)).
* `_split_generators` should now returns `{'split_name': self._generate_examples(), ...}` (but current datasets are backward compatible).
* All dataset inherit from `tfds.core.GeneratorBasedBuilder`. Converting a dataset to beam now only require changing `_generate_examples` (see [example and doc](https://www.tensorflow.org/datasets/beam_datasets#instructions)).
* `tfds.core.SplitGenerator`, `tfds.core.BeamBasedBuilder` are deprecated and will be removed in future version.
* Better `pathlib.Path`, `os.PathLike` compatibility:
* `dl_manager.manual_dir` now returns a pathlib-Like object. Example:
```python
text = (dl_manager.manual_dir / 'downloaded-text.txt').read_text()
```
* Note: Other `dl_manager.download`, `.extract`,... will return pathlib-like objects in future versions
* `FeatureConnector`,... and most functions should accept `PathLike` objects. Let us know if some functions you need are missing.
* Add a `tfds.core.as_path` to create pathlib.Path-like objects compatible with GCS (e.g. `tfds.core.as_path('gs://my-bucket/labels.csv').read_text()`).
* Other bug fixes and improvement. E.g.
* Add `verify_ssl=` option to `tfds.download.DownloadConfig` to disable SSH certificate during download.
* `BuilderConfig` are now compatible with Beam datasets #2348
* `--record_checksums` now assume the new dataset-as-folder model
* `tfds.features.Images` can accept encoded `bytes` images directly (useful when used with `img_name, img_bytes = dl_manager.iter_archive('images.zip')`).
* Doc API now show deprecated methods, abstract methods to overwrite are now documented.
* You can generate `imagenet2012` with only a single split (e.g. only the validation data). Other split will be skipped if not present.
* And of course, new datasets...
Thank you to all our contributors for improving TFDS!
PiperOrigin-RevId: 340614460
0 commit comments