Better selection of sequences if more than --seqs-per-taxon

Currently, if there are more than `--seqs-per-taxon` (default 3, #28 suggests 9) sequences for a taxon only the longest ones will be kept, ties broken arbitrarily. Andreas Kolter suggested to use voucher information from NCBI to not take multiple sequences from the same specimen and to shuffle NCBI IDs to get more diverse studies because sequences from the same study often get similar IDs.