Output order of individuals from write_vcf #3109
-
Hello, Thanks so much for writing tskit! Awesome package. I have a quick question that I can't find the answer to about the order in which individuals are written to the VCF from the write_vcf() function. We are running a msprime demographic simulation, then sampling individuals at different time points with code like this: temporal_samples=[ We don't change the default naming scheme so the individuals are tsk_0, tsk_1, etc. My question is, in this example, would tsk_0 through tsk_5 represent the individuals sampled at time 0, and tsk_6 would be the individual sampled at time 3? So generally, the first individuals in the vcf are from the most recent sampling time? Just wondering how individuals are named according to when they're sampled during the msprime simulation. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The >>> import msprime
>>> temporal_samples = [
... msprime.SampleSet(num_samples=6, population=0, time=0),
... msprime.SampleSet(num_samples=1, population=0, time=3),
... ]
>>> ts = msprime.sim_ancestry(
... samples=temporal_samples,
... population_size=1000,
... sequence_length=10000,
... recombination_rate=1e-8,
... )
>>> ts.individuals_
ts.individuals_flags ts.individuals_population
ts.individuals_location ts.individuals_time
>>> ts.individuals_time
array([0., 0., 0., 0., 0., 0., 3.]) So yes, Looking at the msprime code at https://github.com/tskit-dev/msprime/blob/main/msprime/ancestry.py#L708 we see that the |
Beta Was this translation helpful? Give feedback.
Hi @mountainmanjared!
The
tsk_X
numbers refer to indexes into tskit's individual table, so we can check by looking at theindividuals_time
array:So yes,
tsk_6
is t…