Skip to content

Milestones

List view

  • This will be complete when GVL supports an option to disable shifting and reference genome padding, such that returned sequences correspond exactly to the regions originally given to `gvl.write`. To do this, GVL must also support returning ragged arrays or batches of data padded with something other reference sequence. **Implementation notes** For the ragged array case, this would probably look like partitioning out the ragged array's data accordingly to the reconstruction alg so it could be quite simple. For the padding case, this is also relatively easy by simply skipping the ref-padding block in the reconstruction alg.

    No due date
    4/4 issues closed
  • This will be complete when GVL supports returning sequence annotations that label each nucleotide with the variant index (which row of the VCF/PGEN does the nucleotide correspond to) and reference coordinate. For example, consider a toy sequence for chr1:1-10 ```text personalized: A C G T ... T T A ... variant indices: -1 3 3 -1 ... -1 4 -1 ... reference coordinates: 1 2 2 3 ... 6 7 9 ... ``` where variant 3 is a CG insertion (i.e. match/mismatch C + inserted G) and variant 4 is a T- deletion. **Implementation notes** It's likely most efficient to compute these values during haplotype reconstruction in a single pass as these values are being computed and used anyways. For the API, they can likely be settings that are activated/deactivated similarly to how haps and tracks are enabled/disabled.

    No due date
    3/3 issues closed