Genome download which files to use

I am trying to figure out which settings and files to use to have the most complete and correct representation of a genome.

In the code I found the following type of output files:

REPLICON = 'assembled-molecule'
UNLOCALISED = 'unlocalised-scaffold'
UNPLACED = 'unplaced-scaffold'
PATCH = 'patch'

When downloading a genome, for example GCA_000003215.1

`enaBrowserTools/python3/enaDataGet -f embl --wgs --extract-wgs --expanded GCA_000003215.1`

It generates the following files:

-rw-r--r-- 1 root root 1746946 Oct 20 06:45 ABFD02.dat.gz
-rw-r--r-- 1 root root    5168 Oct 20 06:45 GCA_000003215.1.xml
-rw-r--r-- 1 root root    1242 Oct 20 06:45 GCA_000003215.1_sequence_report.txt
-rw-r--r-- 1 root root 5533183 Oct 20 06:45 assembled-molecule.dat
-rw-r--r-- 1 root root       0 Oct 20 06:45 wgs_scaffolds.dat

In this case I assume the assembled-molecule.dat is the most complete genome file? 
It contains 1 chromosome with unknown gap sizes while the gzip file contains the 31 contigs separately.

Or would it be wiser to always use the gzipped file?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Genome download which files to use #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Genome download which files to use #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions