Skip to content

CRAM field sizes #58

@chris7716

Description

@chris7716

I used the following command and got the output when trying to get the cram field sizes.

./progs/cram_size ~/data/alignments/HG00438.final.cram

Block CORE, total size 46292414
Block content_id 1, total size 737242799 ASC NMc paf XSC MQc ASc XSc
Block content_id 9, total size 3690695708 PGZ FTZ XAZ SAZ ZAZ MCZ MDZ
Block content_id 11, total size 4818799619 RN
Block content_id 12, total size 2964685480 QS
Block content_id 13, total size 8874320 IN
Block content_id 14, total size 363946605 SC
Block content_id 15, total size 275327148 BF
Block content_id 16, total size 106995417 CF
Block content_id 17, total size 331742618 AP
Block content_id 18, total size 343420384 RG
Block content_id 19, total size 40571652 MQ
Block content_id 20, total size 10263823 NS
Block content_id 21, total size 5151436 MF
Block content_id 22, total size 54231023 TS
Block content_id 23, total size 99198297 NP
Block content_id 24, total size 334541841 NF
Block content_id 26, total size 130009094 FN
Block content_id 27, total size 47694920 FC
Block content_id 28, total size 433378423 FP
Block content_id 29, total size 1557480 DL
Block content_id 30, total size 46989745 BA
Block content_id 31, total size 93014727 BS
Block content_id 32, total size 46587756 TL
Block content_id 33, total size 884134 RI

May I know what exactly this output means? I am interested about total sizes of SAM fields.

Is the folowing interpretation correct about the ouput? If so what is the size of CIGAR field?


SAM Field   Description             CRAM Tag    Size (bytes)
------------------------------------------------------------
QNAME       Query template name     RN          4,818,799,619
FLAG        Bitwise flag            BF            275,327,148
RNAME       Reference sequence      AP            331,742,618
POS         1-based position        CORE           46,292,414
MAPQ        Mapping quality         MQ             40,571,652
CIGAR       CIGAR string            CG            (not shown)
RNEXT       Mate reference name     CF            106,995,417
PNEXT       Mate position           NF            334,541,841
TLEN        Template length         TS             54,231,023
SEQ         Segment sequence        BA             46,989,745
QUAL        Quality                 QS          2,964,685,480

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions