Improve logging in datasets?

## Status Quo

Currently our datasets sometimes print diagnostic messages:

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/caltech.py#L128

The common download utilities write to STDOUT

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/utils.py#L156

and use `tqdm` which writes to STDERR:

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/utils.py#L36

The latter has the [option to also write to a different stream](https://github.com/tqdm/tqdm/blob/6791e8c5b3d6c30bdd2060c346996bfb5a6f10d1/tqdm/std.py#L873-L876), but our [fallback from `torch.hub`](https://github.com/pytorch/pytorch/blob/63b8ecc4154b5f292a558c1a9d556176c005c085/torch/hub.py#L40-L44) does not.

In some cases some information is also logged by our [dependencies](https://github.com/ppwwyyxx/cocoapi/blob/71e284ef862300e4319aacd523a64c7f24750178/PythonAPI/pycocotools/coco.py#L79)

https://github.com/pytorch/vision/blob/657c0767c5ca5564c8b437ac44263994c8e01352/torchvision/datasets/coco.py#L36

In any case, the user has no control over it whatsoever.

## Proposal

Have a global or local setting for the stream we write to. For example

```py
torchvision.datasets.logging_stream()
```

I would default it to `sys.stdout`, but no strong opinion. To silence everything, one could do

```py
import os

torchvision.datasets.logging_stream(open(os.devnull, "w"))
```

We could also add a shortcut with `quiet=True` for that.

## Priority

This thing was touched on in https://github.com/pytorch/vision/issues/330#issuecomment-854715846 and from time to time we receive issues (#330) to either silence the output or redirect it to a different stream (#7040). 

Still, I think the priority is pretty low for this. I just wanted to have it in a separate issue to make it easier to track.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve logging in datasets? #7047

Status Quo

Proposal

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve logging in datasets? #7047

Description

Status Quo

Proposal

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions