OTN is a great project, thank you all for it.
This issue aims to document a possible error in the "resolved" categorization.
# download data from
# https://github.com/open-traits-network/otn-taxon-trait-summary/blob/main/traits.csv.gz
otn_raw <-
readr::read_csv("traits.csv")
otn_dataset_try <- otn_raw |>
# filter only the animal kingdom
dplyr::filter(resolveKingdomName == "Animalia") |>
dplyr::filter(datasetId == "https://opentraits.org/datasets/try")
dplyr::glimpse(otn_dataset_try)
# Rows: 5,311
# Columns: 31
# $ taxonIdVerbatim <chr> "1669", "1669", "1669", "1669", "1669", "1…
# $ scientificNameVerbatim <chr> "Agathis philippinensis", "Agathis philipp…
# $ resolvedTaxonId <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ resolvedTaxonName <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ parentTaxonId <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ family <chr> "Araucariaceae", "Araucariaceae", "Araucar…
# $ phylum <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ traitIdVerbatim <dbl> 37, 3400, 759, 98, 3401, 43, 22, 17, 4, 38…
# $ traitNameVerbatim <chr> "Leaf phenology type", "Plant growth form …
# $ bucketId <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ bucketName <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ counts <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ datasetId <chr> "https://opentraits.org/datasets/try", "ht…
# $ numberOfRecords <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 3, …
# $ curator <chr> "https://opentraits.org/members/brian-s-ma…
# $ accessDate <date> 2022-08-19, 2022-08-19, 2022-08-19, 2022-…
# $ comment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ relationName <chr> "HAS_ACCEPTED_NAME", "HAS_ACCEPTED_NAME", …
# $ resolvedExternalId <chr> "COL:6635V", "COL:6635V", "COL:6635V", "CO…
# $ resolvedName <chr> "Agathis philippinensis", "Agathis philipp…
# $ resolvedRank <chr> "species", "species", "species", "species"…
# $ resolvedCommonNames <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
# $ resolvedPath <chr> "Biota | Animalia | Arthropoda | Insecta |…
# $ resolvedPathIds <chr> "COL:5T6MX | COL:N | COL:RT | COL:H6 | COL…
# $ resolvedPathNames <chr> "unranked | kingdom | phylum | class | ord…
# $ resolvedExternalUrl <chr> "https://www.catalogueoflife.org/data/taxo…
# $ resolveKingdomName <chr> "Animalia", "Animalia", "Animalia", "Anima…
# $ resolvedPhylumName <chr> "Arthropoda", "Arthropoda", "Arthropoda", …
# $ resolvedFamilyName <chr> "Braconidae", "Braconidae", "Braconidae", …
# $ providedTraitName <chr> "Leaf phenology type", "Plant growth form …
# $ resolvedTraitName <chr> "Phenology", "Morphology", "UNCATEGORIZED_…
Here are some of the most frequent categories that appear in resolvedPhylumName/resolvedName from this query:
otn_dataset_try |>
dplyr::count(datasetId,
resolveKingdomName,
resolvedPhylumName,
resolvedName,
sort = TRUE) |>
head()
Hello!
OTN is a great project, thank you all for it.
This issue aims to document a possible error in the "resolved" categorization.
While using the dataset, Thiago @thiago-goncalves-souza and I noticed a possible categorization error on the
trydataset (https://opentraits.org/datasets/try).If we filter OTN to get only rows that are from the
trydataset AND Animalia Kingdom (resolveKingdomName == "Animalia"), we get more than 5k rows.But some of the traits seems like they are from plants:
Here are some of the most frequent categories that appear in resolvedPhylumName/resolvedName from this query: