Dear NCBI SRA Support Team,
I am writing to ask for your guidance on the recommended best practices for a large-scale batch analysis.
I am working on a university HPC cluster and need to extract 4 specific genomic regions (using sam-dump --aligned-region) from approximately 1,000 different SRR accessions (e.g., SRR1127217) for a research project.
I want to ensure we do this in the most efficient and respectful way possible. I am considering two potential workflows:
Direct Remote Query: Running sam-dump --aligned-region in a parallel SBATCH array, which would make ~4,000 separate remote queries to your servers.
Prefetch First: Running prefetch on all 1,000 accessions first to download the .sra files locally, and then running our sam-dump --aligned-region script on the local files.
Could you please confirm which of these is the correct and recommended workflow? We want to follow the proper procedure to avoid causing unnecessary load on your servers and to prevent our cluster's IP from being rate-limited or blocked.
Thank you for your time.
Best,
Tushar
Dear NCBI SRA Support Team,
I am writing to ask for your guidance on the recommended best practices for a large-scale batch analysis.
I am working on a university HPC cluster and need to extract 4 specific genomic regions (using sam-dump --aligned-region) from approximately 1,000 different SRR accessions (e.g., SRR1127217) for a research project.
I want to ensure we do this in the most efficient and respectful way possible. I am considering two potential workflows:
Direct Remote Query: Running sam-dump --aligned-region in a parallel SBATCH array, which would make ~4,000 separate remote queries to your servers.
Prefetch First: Running prefetch on all 1,000 accessions first to download the .sra files locally, and then running our sam-dump --aligned-region script on the local files.
Could you please confirm which of these is the correct and recommended workflow? We want to follow the proper procedure to avoid causing unnecessary load on your servers and to prevent our cluster's IP from being rate-limited or blocked.
Thank you for your time.
Best,
Tushar