Skip to content

Improve logging in datasets? #7047

Open
@pmeier

Description

@pmeier

Status Quo

Currently our datasets sometimes print diagnostic messages:

print("Files already downloaded and verified")

The common download utilities write to STDOUT

print("Downloading " + url + " to " + fpath)

and use tqdm which writes to STDERR:

with open(destination, "wb") as fh, tqdm(total=length) as pbar:

The latter has the option to also write to a different stream, but our fallback from torch.hub does not.

In some cases some information is also logged by our dependencies

self.coco = COCO(annFile)

In any case, the user has no control over it whatsoever.

Proposal

Have a global or local setting for the stream we write to. For example

torchvision.datasets.logging_stream()

I would default it to sys.stdout, but no strong opinion. To silence everything, one could do

import os

torchvision.datasets.logging_stream(open(os.devnull, "w"))

We could also add a shortcut with quiet=True for that.

Priority

This thing was touched on in #330 (comment) and from time to time we receive issues (#330) to either silence the output or redirect it to a different stream (#7040).

Still, I think the priority is pretty low for this. I just wanted to have it in a separate issue to make it easier to track.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions