Skip to content

Reproducibility of dataset downloads #126

@faroit

Description

@faroit

When downloading corpora from versioned data stores, I would expect to take into account a tag or specific hash of that dataset. That way users are sure if a specific version of audiomate yields an identical corpus to foster reproducibility.

e.g. lets take the esc-50 corpus: the root url downloads directly from master branch

DOWNLOAD_URL = 'https://github.com/karoldvl/ESC-50/archive/master.zip'

To improve reproducibility, I suggest that audiomate uses tags where possible (github, zenodo, ...) and furthermore provide a checksum mechanism that verifies a successful download.

This issue is part of a JOSS review openjournals/joss-reviews#2135

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions