VDBManagerPathType() returns kptNotFound under stress

I have been bench marking the loading of SRA records using the VDB API to stream sequence data (no quality or other info) on AWS. Similar to the `fasterq-dump` strategy, I am attempting to read each SRA record in parallel, but using the Message Passing Interface (MPI) instead of just threads. Each MPI rank opens and reads a non-overlapping slice of an SRA record.

For a number of parallel MPI ranks gets larger than about 32, I've noticed that  `VDBManagerPathType()` starts returning `kptNotFound` for about 10% of the MPI processes. I've been able to work around this by retrying the call to `VDBManagerPathType()` after waiting 5 seconds. Is there a good way to read an SRA record in parallel, ideally using 100's of independent, but concurrent, processes? I am interested in extracting reads from an SRA file as fast as AWS will allow.

I was assuming that the data is stored in an S3 bucket and that parallel access would be okay. I'm not exactly sure where the data is being stored, since the `srapath` command returns:
`https://locate.ncbi.nlm.nih.gov/sdlr/sdlr.fcgi?jwt=<long string of characters removed>`. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VDBManagerPathType() returns kptNotFound under stress #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VDBManagerPathType() returns kptNotFound under stress #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions