Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- New function `clear_cached_resources()` to remove the session cache and force a reload.
- `load_taxonomic_resources()` now works offline when parquet files have been previously downloaded; `default_version()` falls back to the most recently cached local version when no internet connection is available.
- Internal taxonomic resource tables renamed to snake_case; `family` column added to resource tables.
- Functions `create_species_state_origin_matrix()` and `state_diversity_counts()` now includes the parameter `include_infrataxa`, allowing users to select whether just species-rank taxa or species and infra-specific taxa are output in the table. When `create_species_state_origin_matrix()` is called by `native_anywhere_in_australia()`, `include_infrataxa = TRUE` is set as the default, so infrataxa can also be checked by this function.

# APCalign 1.1.6

Expand Down
2 changes: 1 addition & 1 deletion R/APCalign-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#' [GitHub repository](https://github.com/traitecoevo/APCalign/issues)
#' @keywords internal
#' @section Functions:
#' **Standarise taxon names**
#' **Standardise taxon names**
#'
#' * [load_taxonomic_resources]
#' * [create_taxonomic_update_lookup]
Expand Down
12 changes: 10 additions & 2 deletions R/align_taxa.R
Original file line number Diff line number Diff line change
Expand Up @@ -333,12 +333,20 @@ align_taxa <- function(original_name,
dplyr::filter(original_name %in% resources$APC_accepted$canonical_name) %>%
dplyr::distinct(original_name) %>%
nrow()


synonym_matches <- taxa$tocheck %>%
dplyr::filter(original_name %in% resources$APC_synonyms$canonical_name) %>%
dplyr::filter(!original_name %in% resources$APC_accepted$canonical_name) %>%
dplyr::distinct(original_name) %>%
nrow()

if(!quiet)
message(
" -> of these ",
crayon::blue(perfect_matches),
" names have a perfect match to a scientific name in the APC.
" names have a perfect match to an accepted scientific name in the APC, and ",
crayon::blue(synonym_matches),
" names have a perfect match to a synonym in the APC.
Alignments being sought for remaining names."
)
}
Expand Down
2 changes: 1 addition & 1 deletion R/create_species_state_origin_matrix.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#' @seealso \code{\link{load_taxonomic_resources}}
#'
#' @examples
#' \donttest{create_species_state_origin_matrix()}#'
#' \donttest{create_species_state_origin_matrix()}
#' \donttest{create_species_state_origin_matrix(include_infrataxa = TRUE)}
#'
#'
Expand Down
3 changes: 2 additions & 1 deletion R/load_taxonomic_resources.R
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ load_taxonomic_resources <-
family,
genus
) %>%
dplyr::arrange(taxonomic_status) %>%
dplyr::arrange(relevel_taxonomic_status_preferred_order(taxonomic_status)) %>%
dplyr::mutate(
## strip_names removes punctuation and filler words associated with
## infraspecific taxa (subsp, var, f, ser)
Expand Down Expand Up @@ -233,6 +233,7 @@ load_taxonomic_resources <-
taxonomic_resources[["genera_synonym"]] <-
apc_genera %>%
dplyr::filter(!canonical_name %in% taxonomic_resources$genera_accepted$canonical_name) %>%
dplyr::arrange(relevel_taxonomic_status_preferred_order(taxonomic_status)) %>%
dplyr::mutate(taxonomic_dataset = "APC") %>%
dplyr::distinct(canonical_name, .keep_all = TRUE)

Expand Down
6 changes: 2 additions & 4 deletions R/native_anywhere_in_australia.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@
#' @param resources An optional list of taxonomic resources to use for the lookup.
#' If not provided, the function will load default taxonomic resources using the
#' `load_taxonomic_resources()` function.
#' @param include_infrataxa option to include subspecies, varieties and forms in the output.
#' Set to false as the default, outputting results just for species-rank taxa.
#'
#' @return A tibble with two columns: `species`, which is the same as the unique values of
#' the input `species`, and `native_anywhere_in_aus`, a vector indicating whether each
Expand All @@ -30,10 +28,10 @@
#' @examples
#' \donttest{native_anywhere_in_australia(c("Eucalyptus globulus","Pinus radiata","Banksis notaspecies"))}

native_anywhere_in_australia <- function(species, resources = load_taxonomic_resources(), include_infrataxa = FALSE) {
native_anywhere_in_australia <- function(species, resources = load_taxonomic_resources()) {

# Create lookup tables
full_lookup <- create_species_state_origin_matrix(resources = resources, include_infrataxa = include_infrataxa)
full_lookup <- create_species_state_origin_matrix(resources = resources, include_infrataxa = TRUE)

if(is.null(resources)){
message("Not finding taxonomic resources; check internet connection?")
Expand Down
4 changes: 2 additions & 2 deletions R/standardise_names.R
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ standardise_names <- function(taxon_names) {
f("(\\s|\\()s\\.lat\\.(\\s|\\))", "") %>%
f("(\\s|\\()s\\.str\\.(\\s|\\))", "") %>%

## standarise "ser"
## standardise "ser"
f("\\sser(\\s|\\.\\s)", " ser. ") %>%
f("\\sseries(\\s|\\.\\s)", " ser. ") %>%

Expand All @@ -133,7 +133,7 @@ standardise_names <- function(taxon_names) {
#' the first two words of the taxon name are extracted (e.g. "x Cynochloris"),
#' while for a non-hybrid genus just the first word is extracted (e.g. "Banksia").
#'
#' @param taxon_name
#' @param taxon_name A character vector of scientific names.
#'
#' @return The genus for a scientific name.
#'
Expand Down
22 changes: 14 additions & 8 deletions R/synonyms_for_accepted_names.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#' @title Synonyms for Currently Accepted Names
#'
#' @description
#' This function generates lists a string of synonyms for currently accepted names to facilitate working out past names of a taxon
#' when the current name is known
#' This function generates lists a string of synonyms for currently accepted species and infra-species to facilitate working out past names of a taxon
#' when the current name is known.
#'
#' @param accepted_names A character vector of currently accepted taxon names to look up synonyms for.
#' @param collapse Offering the option to return a long data table with each synonym in its own row,
Expand Down Expand Up @@ -32,6 +32,11 @@ synonyms_for_accepted_names <- function(accepted_names, collapse = TRUE, resourc
dplyr::select(accepted_name_usage_ID, accepted_name = canonical_name) |>
dplyr::filter(accepted_name %in% accepted_names)

if(nrow(accepted_names_with_usageID) == 0){
message("None of the taxon names you submitted are accepted by the APC. Look within `resources$APC_accepted` to ensure you have a properly formatted name.")
return(NULL)
}

# preferred order of taxonomic updates (function from `update_taxonomy.R`)
relevel_taxonomic_status_preferred_order <- function(taxonomic_status) {

Expand Down Expand Up @@ -66,12 +71,12 @@ synonyms_for_accepted_names <- function(accepted_names, collapse = TRUE, resourc
)
}

# generate list of accepted_name_usage_ID's for accepted species
# Generate list of accepted_name_usage_ID's for accepted species
APC_synonyms_tmp <- resources$APC |>
dplyr::filter(taxon_rank %in% c("species", "variety", "form", "subspecies")) |>
# merge currently accepted names for each taxon onto all the synonyms
dplyr::right_join(accepted_names_with_usageID, by = "accepted_name_usage_ID") |>
dplyr::select(canonical_name, taxonomic_status, accepted_name, accepted_name_usage_ID) |>
dplyr::select(canonical_name, taxonomic_status, accepted_name, accepted_name_usage_ID, taxon_ID) |>
# remove the accepted names themselves
dplyr::filter(taxonomic_status != "accepted") |>
dplyr::mutate(
Expand All @@ -93,22 +98,23 @@ synonyms_for_accepted_names <- function(accepted_names, collapse = TRUE, resourc
dplyr::distinct(accepted_name_usage_ID, synonyms)

accepted_names_with_synonyms <- resources$APC |>
dplyr::select(canonical_name, taxon_rank, name_type, genus, family, scientific_name, accepted_name_usage_ID) |>
dplyr::select(canonical_name, family, scientific_name, accepted_name_usage_ID) |>
dplyr::filter(canonical_name %in% accepted_names_with_usageID$accepted_name & accepted_name_usage_ID %in% accepted_names_with_usageID$accepted_name_usage_ID) |>
dplyr::distinct(canonical_name, .keep_all = TRUE) |>
dplyr::left_join(APC_synonyms, by = "accepted_name_usage_ID") |>
dplyr::rename(taxon_name = canonical_name) |>
dplyr::arrange(family, taxon_name)
dplyr::select(family, accepted_name = canonical_name, synonyms, scientific_name, accepted_name_usage_ID) |>
dplyr::arrange(family, accepted_name)

} else {

# Create a long list if collapse = F, with one row per synonym
accepted_names_with_synonyms <- resources$APC |>
dplyr::select(canonical_name, taxon_rank, name_type, genus, family, scientific_name, accepted_name_usage_ID) |>
dplyr::select(canonical_name, family, scientific_name, accepted_name_usage_ID) |>
dplyr::filter(canonical_name %in% accepted_names_with_usageID$accepted_name & accepted_name_usage_ID %in% accepted_names_with_usageID$accepted_name_usage_ID) |>
dplyr::distinct(canonical_name, .keep_all = TRUE) |>
dplyr::select(-canonical_name) |>
dplyr::left_join(APC_synonyms_tmp, by = "accepted_name_usage_ID") |>
dplyr::select(family, accepted_name, synonym = canonical_name, taxonomic_status, scientific_name, accepted_name_usage_ID, taxon_ID) |>
dplyr::arrange(family, accepted_name)
}

Expand Down
2 changes: 1 addition & 1 deletion R/update_taxonomy.R
Original file line number Diff line number Diff line change
Expand Up @@ -600,7 +600,7 @@ update_taxonomy_APC_species_and_infraspecific_taxa <- function(data, resources,
suggested_name = ifelse(!is.na(suggested_collapsed_name), suggested_collapsed_name, suggested_name),
## these are occasionally taxa where the `accepted_name_usage_ID` links to a taxon that is "known" by APC, but doesn't have taxonomic_status = "accepted"
## for these taxa, the suggested name is the `canonical_name` associated with the particular `accepted_name_usage_ID`
suggested_name = ifelse(is.na(suggested_name) & !is.na(taxon_ID), resources$APC_synonyms$canonical_name[match(taxon_ID,resources$APC_synonyms$accepted_name_usage_ID)], suggested_name),
suggested_name = ifelse(is.na(suggested_name) & !is.na(taxon_ID), resources$APC$canonical_name[match(taxon_ID, resources$APC$taxon_ID)], suggested_name),
## if there are no "accepted names" (or similar), the aligned name becomes the suggested name
suggested_name = ifelse(is.na(suggested_name), aligned_name, suggested_name),
taxonomic_status = ifelse(is.na(accepted_name), taxonomic_status_aligned, "accepted"),
Expand Down
47 changes: 43 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,26 @@ library(APCalign)

# APCalign <img src="man/figures/APCalign_hex_2.svg" align="right" width="120"/>

`APCalign` uses the [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and [Australian Plant Name Index](https://biodiversity.org.au/nsl/services/search/names) to align and update Australian plant taxon names. 'APCalign' also supplies information about the established status (i.e., native/introduced) of plant taxa within different states/territories as compiled by the APC. It's useful for updating species list and intersecting them with the APC consensus for both taxonomy and establishment status.
When working with biodiversity data, it is important to verify taxonomic names with an authoritative list and correct any out-of-date names or names with typos.

DOI: https://doi.org/10.1071/BT24014
The 'APCalign' package simplifies this process by:

- Accessing up-to-date taxonomic information from the [Australian Plant Census](https://biodiversity.org.au/nsl/services/search/taxonomy) and the [Australia Plant Name Index](https://biodiversity.org.au/nsl/services/search/names).
- Aligning authoritative names to your taxonomic names using our [fuzzy matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html)
- Updating your taxonomic names in a transparent, reproducible manner
- Because APCalign was developed explicitly for the Australian flora it handles phrase names and aligns disparate phrase name syntax
- Indicating when a split leads to uncertainty in a name alignment

'APCalign' also supplies information about the established status (i.e., native/introduced) of plant taxa within different states/territories as compiled by the APC. It's useful for updating species list and intersecting them with the APC consensus for both taxonomy and establishment status.

Read the [APCalign paper](https://doi.org/10.1071/BT24014) to learn more about the motivations for this project and our fuzzy matching and aligning algorithms.

## Installation 🛠️

From CRAN:

```{r install, eval= FALSE}
install.packages("APCalign")
install.packages("APCalign")

library(APCalign)
```
Expand All @@ -47,6 +57,8 @@ install.packages("remotes")
remotes::install_github("traitecoevo/APCalign")
```

Or for the ShinyApp head to [unsw.shinyapps.io/APCalign-app](https://unsw.shinyapps.io/APCalign-app/)

## A quick demo

Generating a look-up table can be done with just one function:
Expand All @@ -61,16 +73,28 @@ create_taxonomic_update_lookup(
)
```

If you're going to use `APCalign` more than once, it will save you time to load the taxonomic resources into memory first:
You can alternatively load the taxonomic resources into memory first:

```{r,message=FALSE}
tax_resources <- load_taxonomic_resources()

create_taxonomic_update_lookup(
taxa = c(
"Banksia integrifolia",
"Banksya integrifolla",
"Banksya integriifolla",
"Banksyya integriifolla",
"Banksia red flowers",
"Banksia sp.",
"Banksia catoglypta",
"Dryandra catoglypta",
"Dryandra cataglypta",
"Dryandra australis",
"Acacia longifolia",
"Commersonia rosea",
"Panicum sp. Hairy glumes (C.R.Michell 4192)",
"Panicum sp. Hairy glumes (Michell)",
"Panicum sp. Hairy glumes",
"not a species"
),
resources = tax_resources
Expand All @@ -81,9 +105,16 @@ Checking for a list of species to see if they are classified as Australian nativ

```{r, message=FALSE}
native_anywhere_in_australia(c("Eucalyptus globulus","Pinus radiata"), resources = tax_resources)
```

Determining the number of species present in NSW and their establishment means:
```{r, message=FALSE}
state_diversity_counts("NSW", resources = tax_resources)
```

The related function `create_species_state_origin_matrix()` generates a table for all taxa in Australia, indicating their distribution and establishment means, by state.


Getting a family lookup table for genera from the specified taxonomy:

```{r, message=FALSE}
Expand All @@ -96,6 +127,14 @@ get_apc_genus_family_lookup(c("Eucalyptus",
resources = tax_resources)
```

Compiling a list of outdated synonyms for currently accepted names:

```{r, message=FALSE}
names_to_check <- c("Acacia aneura", "Banksia nivea", "Cardamine gunnii", "Stenocarpus sinuatus")
synonyms_for_accepted_names(resources = tax_resources, accepted_names = names_to_check, collapse = T)
```


## Cheatsheet

<a href="https://github.com/traitecoevo/APCalign/blob/master/inst/cheatsheet/APCalign-cheatsheet.pdf"><img src="man/figures/APCalign-cheatsheet.png" width="60%"/></a>
Expand Down
Loading
Loading