Skip to content

Implement client for new PMC S3 content access#1500

Open
bgyori wants to merge 6 commits into
gyorilab:masterfrom
bgyori:pmc_s3
Open

Implement client for new PMC S3 content access#1500
bgyori wants to merge 6 commits into
gyorilab:masterfrom
bgyori:pmc_s3

Conversation

@bgyori
Copy link
Copy Markdown
Member

@bgyori bgyori commented Apr 17, 2026

This PR implements new client functions in indra.literature.pmc_client to migrate from the soon deprecated FTP-based content access to the new S3-based access (see https://pmc.ncbi.nlm.nih.gov/tools/pmcaws/). Previously, each article had a non-obvious FTP path that had to be looked up in a catalogue based on the PMID or PMC ID and then used to fetch a tar file with the "package" of files corresponding to the article. With S3, there is a fixed access path for a given PMC ID (+ version) and the list and type of files is available via a separate metadata request. The new client functions implement requests to get available (or latest) version, metadata, getter functions for different types of content, and a single function to download all article files into a given folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant