Upload link_completeness function and test. Also internal functions#34
Upload link_completeness function and test. Also internal functions#34Pakillo merged 5 commits intoEcologyR:mainfrom
Conversation
Summary of ChangesHello @jmalcan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the package's capabilities by introducing a robust mechanism to assess the completeness and coverage of sampled ecological interactions. It provides a new user-facing function, Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds a new link_completeness function with associated tests, and a large number of internal helper functions. The addition of tests is great for ensuring the functionality is correct.
My review focuses on improving code clarity, efficiency, and fixing a few bugs in the new internal functions. I've noticed a few patterns across the new code in R/All_internal.R:
- Many
forloops are used for data manipulation where vectorized R operations would be much more efficient and idiomatic. - The
ifelse()function is sometimes used for control flow, which is an anti-pattern. require()is used inside functions, which is generally discouraged.
I've left specific comments with suggestions for these. I also found a critical bug in visu_funtopol_UNI that would cause a runtime error, and a bug in RN_heatmap_UNI where a parameter is ignored.
Finally, for consistency, it would be good to write the test descriptions in English, as the rest of the project seems to follow this convention.
| nodes_list <- funtopol_UNI(int_data)$Functional_classification | ||
|
|
||
| if(max(SCCs$csize)>1){ |
There was a problem hiding this comment.
The variable SCCs is used here but it is not defined in the scope of the visu_funtopol_UNI function. This will cause a runtime error. It seems this logic was copied from funtopol_UNI where SCCs is calculated internally. To fix this, you should get the required information from the result of the funtopol_UNI call. For example, you could check the number of core species from the 'Descriptors' data frame returned by funtopol_UNI.
funtopol_output <- funtopol_UNI(int_data)
nodes_list <- funtopol_output$Functional_classification
if (funtopol_output$Descriptors["Num. core species", "Value"] > 1) {| RNc$Ac <- RNc$Canopy | ||
| RNc$Ar <- RNc$Recruit | ||
| for (i in 1:dim(RNc)[1]) { | ||
| RNc$Ac[i] <- as.numeric(replace( | ||
| RNc$Ac[i], | ||
| match(RN_list, RNc$Canopy[i]), | ||
| cover_df$Ac[match(RNc$Canopy[i], cover_df$Canopy)] | ||
| )) | ||
| } | ||
|
|
||
| for (i in 1:dim(RNc)[1]) { | ||
| RNc$Ar[i] <- as.numeric(replace( | ||
| RNc$Ar[i], | ||
| match(RN_list, RNc$Recruit[i]), | ||
| cover_df$Ac[match(RNc$Recruit[i], cover_df$Canopy)] | ||
| )) | ||
| } |
There was a problem hiding this comment.
The for loops used to populate the Ac and Ar columns are inefficient as they process the data frame row by row. In R, it's best practice to use vectorized operations for this kind of task, as they are significantly faster and more readable. You can achieve the same result more efficiently by using match on the entire vectors.
RNc$Ac <- cover_df$Ac[match(RNc$Canopy, cover_df$Canopy)]
RNc$Ar <- cover_df$Ac[match(RNc$Recruit, cover_df$Canopy)]| Effect_int <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse((df$testability[i]>0.05), | ||
| Effect_int[i] <- "Not testable", | ||
| ifelse(df$Significance[i] > 0.05, | ||
| Effect_int[i] <- "Neutral", | ||
| ifelse((df$Fcr[i]/df$Ac[i])>(df$Fro[i]/df$Ao[i]), | ||
| Effect_int[i] <- "Enhancing", | ||
| Effect_int[i] <- "Depressing") | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Using nested ifelse for multiple conditions with assignments inside is an anti-pattern in R. It makes the code hard to read, debug, and is inefficient because it grows the Effect_int vector inside a loop. A better approach is to use a single vectorized ifelse or dplyr::case_when. This makes the code more idiomatic, faster, and easier to understand. This comment also applies to other similar loops in this file for calculating Significance and Test_type.
df$Effect_int <- ifelse(df$testability > 0.05, "Not testable",
ifelse(df$Significance > 0.05, "Neutral",
ifelse((df$Fcr/df$Ac) > (df$Fro/df$Ao), "Enhancing", "Depressing")))| RN_heatmap_UNI <- function(int_data, weight_var = c("Fcr", "Dcr", "Icr", "Pcr"), scale_top = 1) { | ||
| require(ggplot2) | ||
| # manually set node order | ||
| canopy_order <- unique(int_data$Canopy) | ||
| canopy_order <- canopy_order[!canopy_order %in% c('Open')] | ||
| canopy_order <- c("Open", canopy_order) | ||
| int_data$Canopy2 <- factor(int_data$Canopy, levels = canopy_order) | ||
| recruit_order <- sort(unique(int_data$Canopy), decreasing = TRUE) | ||
| recruit_order <- recruit_order[!recruit_order %in% c('Open')] | ||
| recruit_order <- c(recruit_order, "Open") | ||
| int_data$Recruit2 <- factor(int_data$Recruit, levels = recruit_order) | ||
|
|
||
| # Add recruitment density as another weighting variable | ||
| int_data$Dcr <- int_data$Fcr/int_data$Ac | ||
|
|
||
| # Make weight variable | ||
| int_data$weight <- int_data[weight_var] | ||
|
|
||
| # Lowest (non-zero) and highest values of the weighting variable | ||
| highest_W <- max(int_data$weight) | ||
| lowest_W <- min(int_data$weight[int_data$weight>0]) | ||
|
|
||
| # Plot the heatmap | ||
| ggplot(int_data, aes(Canopy2, Recruit2, fill= Dcr)) + | ||
| geom_tile(colour="gray", size=0.25, aes(height = 1)) + | ||
| scale_fill_gradientn(colours = c("#F5F5F5", "#E69F00","#0072B2"), values = c(0,lowest_W, scale_top*highest_W)) + | ||
| # scale_fill_gradient(low="white", high="turquoise3")+ | ||
| scale_x_discrete(position = "top") + | ||
| theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=0)) | ||
| } |
There was a problem hiding this comment.
This function has a few issues related to the weight_var parameter:
- It allows multiple values by default, which can lead to errors. It's better to enforce a single choice using
match.arg(). - The
fillaesthetic inggplotis hardcoded toDcr, ignoring theweight_varparameter. The plot will not reflect the chosen weighting variable. - The
weightcolumn is created but not used directly in the plot aesthetics.
Here is a revised version of the function that addresses these points for correctness and clarity.
RN_heatmap_UNI <- function(int_data, weight_var = c("Fcr", "Dcr", "Icr", "Pcr"), scale_top = 1) {
require(ggplot2)
weight_var <- match.arg(weight_var)
# manually set node order
canopy_order <- unique(int_data$Canopy)
canopy_order <- canopy_order[!canopy_order %in% c('Open')]
canopy_order <- c("Open", canopy_order)
int_data$Canopy2 <- factor(int_data$Canopy, levels = canopy_order)
recruit_order <- sort(unique(int_data$Canopy), decreasing = TRUE)
recruit_order <- recruit_order[!recruit_order %in% c('Open')]
recruit_order <- c(recruit_order, "Open")
int_data$Recruit2 <- factor(int_data$Recruit, levels = recruit_order)
# Add recruitment density as another weighting variable
int_data$Dcr <- int_data$Fcr/int_data$Ac
# Lowest (non-zero) and highest values of the weighting variable
weight_values <- int_data[[weight_var]]
highest_W <- max(weight_values, na.rm = TRUE)
lowest_W <- min(weight_values[weight_values > 0], na.rm = TRUE)
# Plot the heatmap
ggplot(int_data, aes(Canopy2, Recruit2, fill = .data[[weight_var]])) +
geom_tile(colour="gray", size=0.25, aes(height = 1)) +
scale_fill_gradientn(colours = c("#F5F5F5", "#E69F00","#0072B2"), values = c(0,lowest_W, scale_top*highest_W)) +
# scale_fill_gradient(low="white", high="turquoise3")+
scale_x_discrete(position = "top") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=0))
}
R/All_internal.R
Outdated
|
|
||
| if (!"Plot" %in% names(int_data)) stop("ERROR: your interactions data lacks a column named Plots. This function requires data assembled in plots.") | ||
|
|
||
| require(igraph) |
There was a problem hiding this comment.
Using require() inside a function is generally discouraged in R packages. It's better to list package dependencies in the DESCRIPTION file (e.g., in Imports) and then use the :: operator to call functions from that package (e.g., igraph::graph_from_data_frame). Since you are already using igraph::, the require(igraph) call is redundant and can be removed. This applies to other functions in this file that use require().
| netRaw <- data.frame(cbind(data_raw$Plot, paste(data_raw$Canopy, data_raw$Recruit))) | ||
| colnames(netRaw) <- c("Plot", "Pair") | ||
| nPlots <- length(unique(netRaw$Plot)) | ||
|
|
||
| # Check points. | ||
|
|
||
| if (nPlots == 1) | ||
| stop( | ||
| "ERROR: your data is not structured in multiple plots. Incidence approach cannot be used. Try the abundance approach." | ||
| ) | ||
|
|
||
| if (nPlots < 10) | ||
| warning( | ||
| "WARNING: your are using the incidence approach with very few plots. Consider using the abundance approach if appropriate." | ||
| ) | ||
|
|
||
| # Combine the lists of canopy and recruit species to obtain the total list of canopy-recruit pairs (links) sampled. | ||
|
|
||
| a1 <- split(netRaw[-1], f = netRaw[1]) | ||
| a2 <- lapply(a1, unique) | ||
| a3 <- unlist(unlist(a2, recursive = FALSE, use.names = FALSE)) |
There was a problem hiding this comment.
The code for preparing data for incidence analysis can be made clearer and more robust. Using data.frame(cbind(...)) can lead to unexpected type coercion. Creating the data frame with named arguments is safer. Also, the way pairs are extracted and unlisted is a bit convoluted. Using split() on the vector of pairs directly simplifies the logic.
netRaw <- data.frame(Plot = data_raw$Plot, Pair = paste(data_raw$Canopy, data_raw$Recruit))
nPlots <- length(unique(netRaw$Plot))
# Check points.
if (nPlots == 1)
stop(
"ERROR: your data is not structured in multiple plots. Incidence approach cannot be used. Try the abundance approach."
)
if (nPlots < 10)
warning(
"WARNING: your are using the incidence approach with very few plots. Consider using the abundance approach if appropriate."
)
# Combine the lists of canopy and recruit species to obtain the total list of canopy-recruit pairs (links) sampled.
plot_pairs <- split(netRaw$Pair, netRaw$Plot)
unique_pairs_per_plot <- lapply(plot_pairs, unique)
a3 <- unlist(unique_pairs_per_plot, use.names = FALSE)|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new user-facing function link_completeness with its tests, and a large number of internal functions. While the new functionality is a valuable addition, the implementation in R/All_internal.R has several significant issues that should be addressed. There is a large amount of duplicated code, particularly between _BI and _BI_COMP suffixed functions, and also between the new link_completeness function and its internal counterpart. There are also several instances of inefficient or non-idiomatic R code, such as using loops where vectorization is possible, and incorrect use of ifelse for control flow. Furthermore, some functions call other functions that are not defined, which will cause errors. Finally, package dependencies are not handled correctly using DESCRIPTION and NAMESPACE, but with require() calls inside functions. I've left specific comments with suggestions for improvement. Addressing these points will greatly improve the code's maintainability, performance, and correctness.
| node_degrees_BI <- function(int_data,cover_data){ | ||
|
|
||
|
|
||
| matrix_Fcr<-RN_to_matrix(int_data,cover_data, int_type="fac", weight="Fcr") |
|
|
||
| matrix_Fcr<-RN_to_matrix(int_data,cover_data, int_type="fac", weight="Fcr") | ||
| matrix_Pcr <-ifelse(matrix_Fcr>0,1,0) | ||
| p<-associndex(int_data,cover_data,expand="no",rm_sp_no_cover="onlycanopy" ,threshold_density = 100) |
| node_degrees_BI_COMP <- function(int_data,cover_data){ | ||
|
|
||
|
|
||
| matrix_Fcr<-RN_to_matrix(int_data,cover_data, int_type="comp", weight="RII") |
|
|
||
| matrix_Fcr<-RN_to_matrix(int_data,cover_data, int_type="comp", weight="RII") | ||
| matrix_Pcr <-ifelse(matrix_Fcr<0,1,0) | ||
| p<-associndex(int_data,cover_data,expand="yes",rm_sp_no_cover="onlycanopy" ,threshold_density = 100) |
| link_completeness <- function(int_data = NULL, | ||
| type = c("incidence", "abundance")) { | ||
|
|
||
| stopifnot( | ||
| c("Plot", | ||
| "Canopy", | ||
| "Recruit", | ||
| "Frequency" | ||
| ) %in% names(int_data)) | ||
|
|
||
| type <- match.arg(type) | ||
|
|
||
| data_raw <- int_data | ||
| data_RN <- aggr_RN_UNI(data_raw) | ||
|
|
||
| # Completeness based on incidence data. | ||
|
|
||
| if (type == "incidence") { | ||
| netRaw <- data.frame(cbind(data_raw$Plot, paste(data_raw$Canopy, data_raw$Recruit))) | ||
| colnames(netRaw) <- c("Plot", "Pair") | ||
| nPlots <- length(unique(netRaw$Plot)) | ||
|
|
||
| # Check points. | ||
|
|
||
| if (nPlots == 1) | ||
| stop( | ||
| "ERROR: your data is not structured in multiple plots. Incidence approach cannot be used. Try the abundance approach." | ||
| ) | ||
|
|
||
| if (nPlots < 10) | ||
| warning( | ||
| "WARNING: your are using the incidence approach with very few plots. Consider using the abundance approach if appropriate." | ||
| ) | ||
|
|
||
| # Combine the lists of canopy and recruit species to obtain the total list of canopy-recruit pairs (links) sampled. | ||
|
|
||
| a1 <- split(netRaw[-1], f = netRaw[1]) | ||
| a2 <- lapply(a1, unique) | ||
| a3 <- unlist(unlist(a2, recursive = FALSE, use.names = FALSE)) | ||
|
|
||
| # Table showing the incidence of each canopy-recruit pair in the study site | ||
|
|
||
| a4 <- table(a3) | ||
| linkIncidence <- as.data.frame(a4) | ||
| colnames(linkIncidence) <- c("Pair", "Incidence") | ||
|
|
||
| # Incidence list to be passed to iNEXT | ||
|
|
||
| data_iNEXT <- c(nPlots, sort(linkIncidence$Incidence, decreasing = TRUE)) | ||
|
|
||
| # Call to iNEXT to obtain completeness values | ||
|
|
||
| out <- iNEXT::iNEXT( | ||
| data_iNEXT, | ||
| q = c(0, 1), | ||
| datatype = "incidence_freq", | ||
| se = FALSE, | ||
| size = nPlots | ||
| ) | ||
| Lobs <- out$AsyEst[1, 1] | ||
| Lest <- out$AsyEst[1, 2] | ||
| Lest_LCL <- out$AsyEst[1, 4] | ||
| Lest_UCL <- out$AsyEst[1, 5] | ||
| Cq0_L <- Lobs / Lest | ||
| Cq1_L <- out$DataInfo[1, 5] | ||
| df <- data.frame(c(nPlots, Lobs, Lest, Cq0_L, Cq1_L)) | ||
| colnames(df) <- c("Incidence based estimate") | ||
| rownames(df) <- c("Num. Plots sampled", | ||
| "Lobs", | ||
| "Lest", | ||
| "Completeness Links (q=0)", | ||
| "Coverage Links (q=1)") | ||
| } | ||
|
|
||
| # Completeness based on abundance or frequency of recruits. | ||
|
|
||
| if (type == "abundance") { | ||
|
|
||
| warning("Abundance-based approach assumes that each individual recruit provides independent data about the canopy-recruit interaction. If conspecific recruits frequently occur aggregated under individual canopy plants, the estimates of completeness and coverage may be severely overestimated." | ||
| ) | ||
|
|
||
| # Call to iNEXT to obtain completeness values | ||
|
|
||
| nPlots <- length(unique(int_data$Plot)) | ||
|
|
||
| if (nPlots > 9) | ||
| warning( | ||
| "Your data is structured in multiple plots. Incidence-based approach is recommended." | ||
| ) | ||
|
|
||
| out <- iNEXT::iNEXT(data_RN$Fcr[which(data_RN$Fcr > 0)], q = 0, datatype = "abundance") | ||
| nPlants <- out$DataInfo[1,2] | ||
| Lobs <- out$AsyEst[1, 1] | ||
| Lest <- out$AsyEst[1, 2] | ||
| Cq0_L <- Lobs / Lest | ||
| Cq1_L <- out$DataInfo[1, 4] | ||
| df <- data.frame(c(nPlants, Lobs, Lest, Cq0_L, Cq1_L)) | ||
| colnames(df) <- c("Abundance based estimate") | ||
| rownames(df) <- c("Num. plants (recruits) sampled", | ||
| "Lobs", | ||
| "Lest", | ||
| "Completeness of links (q=0)", | ||
| "Coverage of links (q=1)") | ||
| } | ||
|
|
||
| return(df) | ||
| } |
There was a problem hiding this comment.
The implementation of this function is identical to link_completeness_UNI in R/All_internal.R. This code duplication is a significant maintainability issue. If any changes are needed in the future, they would have to be applied in two places, which is error-prone. The user-facing function link_completeness should call the internal function link_completeness_UNI instead of reimplementing it.
link_completeness <- function(int_data = NULL,
type = c("incidence", "abundance")) {
link_completeness_UNI(int_data = int_data, type = type)
}| Effect_int <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse((df$testability[i]>0.05), | ||
| Effect_int[i] <- "Not testable", | ||
| ifelse(df$Significance[i] > 0.05, | ||
| Effect_int[i] <- "Neutral", | ||
| ifelse((df$Fcr[i]/df$Ac[i])>(df$Fro[i]/df$Ao[i]), | ||
| Effect_int[i] <- "Enhancing", | ||
| Effect_int[i] <- "Depressing") | ||
| ) | ||
| ) | ||
| } | ||
|
|
||
| df$Effect_int <- Effect_int |
There was a problem hiding this comment.
This loop with nested ifelse calls for side-effects is inefficient and hard to read. This logic can be expressed much more clearly and efficiently using a single vectorized ifelse statement, which is idiomatic R. This avoids the loop and the incorrect use of ifelse for flow control.
df$Effect_int <- ifelse(df$testability > 0.05, "Not testable",
ifelse(df$Significance > 0.05, "Neutral",
ifelse((df$Fcr / df$Ac) > (df$Fro / df$Ao), "Enhancing", "Depressing")))
R/All_internal.R
Outdated
|
|
||
| if (!"Plot" %in% names(int_data)) stop("ERROR: your interactions data lacks a column named Plots. This function requires data assembled in plots.") | ||
|
|
||
| require(igraph) |
There was a problem hiding this comment.
Using require() inside a package function is not recommended. Package dependencies should be declared in the DESCRIPTION file. To use functions from other packages, either use the :: operator (e.g., igraph::union()) or import the package or specific functions using roxygen tags like @import or @importFrom. This ensures that the package dependencies are properly handled and checked when the package is built and installed. require() can lead to subtle bugs if the package is not available. This issue is present in other functions in this file as well (e.g., cum_values_UNI on line 729, visu_funtopol_UNI on line 959, and RN_heatmap_UNI on line 985).
| associndex_UNISITE_BI <- function(int_data = NULL, | ||
| threshold_density = 100) { | ||
|
|
||
| if (!"Open" %in% int_data$Canopy) stop("ERROR: tests cannot be conducted because your data does not contain a node named Open or it is spelled differently.") | ||
|
|
||
| thr <- threshold_density | ||
|
|
||
| # Assemble the data | ||
| db_inter <- pre_associndex_UNISITE_BI(int_data) | ||
|
|
||
| # Incorporate density of recruitment (recruits/m2) under each canopy species and in open. | ||
| db_inter$Dcr <- db_inter$Fcr/db_inter$Ac | ||
| db_inter$Dro <- db_inter$Fro/db_inter$Ao | ||
|
|
||
| # Retain the interactions with estimated density below the threshold. | ||
| db_inter <- db_inter[which(db_inter$Dcr<thr & db_inter$Dro<thr), ] | ||
|
|
||
| #Obtain the maximum recruitment density for each recruit under the canopy species or in open. | ||
| db_inter$Max_Recr_Density <- pmax(db_inter$Dcr,db_inter$Dro) | ||
|
|
||
| db_inter <- utils::type.convert(db_inter, as.is = TRUE) | ||
|
|
||
| max_rd <- stats::aggregate(Max_Recr_Density ~ Recruit, data = db_inter, FUN = "max") | ||
|
|
||
| # Add a variable max_Recr to each pair indicating the maximum recruitment density of the recruit species in the study site | ||
| Recr_list <- sort(unique(c(db_inter$Recruit))) | ||
| Dens_list <- sort(unique(max_rd$Recruit)) | ||
| lack_dens <- setdiff(Recr_list, Dens_list) | ||
|
|
||
| db_inter$max_Recr <- db_inter$Recruit | ||
| for (i in 1:(dim(db_inter)[1])) { | ||
| db_inter$max_Recr[i] <- replace( | ||
| db_inter$max_Recr[i], | ||
| match(Recr_list, db_inter$max_Recr[i]), | ||
| max_rd$Max_Recr_Density[match(db_inter$max_Recr[i], max_rd$Recruit)] | ||
| ) | ||
| } | ||
|
|
||
| db_inter <- utils::type.convert(db_inter, as.is = TRUE) | ||
|
|
||
| # Calculate indices Ns, NintC, NintA and RII | ||
| db_inter$Ns <- (db_inter$Dcr - db_inter$Dro)/db_inter$max_Recr | ||
| db_inter$NintC <- 2*(db_inter$Dcr - db_inter$Dro)/((db_inter$Dcr + db_inter$Dro)+abs(db_inter$Dcr-db_inter$Dro)) | ||
| db_inter$NintA <- 2*(db_inter$Dcr - db_inter$Dro)/((db_inter$Dro) + abs(db_inter$Dcr-db_inter$Dro)) | ||
| db_inter$RII <- (db_inter$Dcr - db_inter$Dro)/(db_inter$Dcr + db_inter$Dro) | ||
|
|
||
| removed <- names(db_inter) %in% c("max_Recr", "Max_Recr_Density") | ||
| db_inter <- db_inter[!removed] | ||
| return(db_inter) | ||
|
|
||
| } |
There was a problem hiding this comment.
The function associndex_UNISITE_BI is almost an exact duplicate of associndex_UNISITE_BI_COMP. The only difference is that associndex_UNISITE_BI_COMP calls pre_associndex_UNISITE_BI_COMP instead of pre_associndex_UNISITE_BI. This duplication makes the code harder to maintain. Consider refactoring these two functions into a single function that accepts the pre-processing function as an argument, or uses a parameter to decide which pre-processing function to call.
| int_significance_BI <- function(int_data){ | ||
|
|
||
| if (!"Open" %in% int_data$Canopy) stop("ERROR: tests cannot be conducted because your data does not contain a node named Open or it is spelled differently.") | ||
|
|
||
| df <- pre_associndex_UNISITE_BI(int_data) | ||
| n_tests <- dim(df)[1] | ||
| df$exp_p <- df$Ac/(df$Ac+df$Ao) # Expected probability of success (i.e. of recruiting under canopy) | ||
|
|
||
| # Testability through Binomial test | ||
|
|
||
| df$Ftot <- df$Fcr+df$Fro | ||
|
|
||
| extreme_p <- c() | ||
| for(i in 1:n_tests){ | ||
| extreme_p[i] <- min(df$exp_p[i], 1-df$exp_p[i]) | ||
| } | ||
| df$extreme_p <- extreme_p | ||
|
|
||
| testability <- c() | ||
| for(i in 1:n_tests) { | ||
| testability[i] <- binom.test(df$Ftot[i], df$Ftot[i], df$extreme_p[i], alternative ="two.sided")$p.value | ||
| } | ||
| df$testability <- testability | ||
|
|
||
| # Binomial (or Chi square) Test Significance | ||
|
|
||
| Significance <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse(((df$Fcr[i]+df$Fro[i])*(df$Ac[i]/(df$Ac[i]+df$Ao[i]))<=5 | (df$Fcr[i]+df$Fro[i])*(df$Ao[i]/(df$Ac[i]+df$Ao[i]))<=5), | ||
| Significance[i] <- binom.test(df$Fcr[i], df$Fcr[i]+df$Fro[i], df$exp_p[i], alternative ="two.sided")$p.value, | ||
| Significance[i] <- chisq.test(c(df$Fcr[i], df$Fro[i]), p = c(df$exp_p[i], 1-df$exp_p[i]))$p.value | ||
| ) | ||
| } | ||
| df$Significance <- Significance | ||
|
|
||
| Test_type <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse(((df$Fcr[i]+df$Fro[i])*(df$Ac[i]/(df$Ac[i]+df$Ao[i]))<=5 | (df$Fcr[i]+df$Fro[i])*(df$Ao[i]/(df$Ac[i]+df$Ao[i]))<=5), | ||
| Test_type[i] <- "Binomial", | ||
| Test_type[i] <- "Chi-square" | ||
| ) | ||
| } | ||
| df$Test_type <- Test_type | ||
| # if(length(unique(df$Test_type))>1) warning("Different tests were used for different canopy-recruit pairs. Check column Test_type") | ||
|
|
||
| Effect_int <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse((df$testability[i]>0.05), | ||
| Effect_int[i] <- "Not testable", | ||
| ifelse(df$Significance[i] > 0.05, | ||
| Effect_int[i] <- "Neutral", | ||
| ifelse((df$Fcr[i]/df$Ac[i])>(df$Fro[i]/df$Ao[i]), | ||
| Effect_int[i] <- "Enhancing", | ||
| Effect_int[i] <- "Depressing") | ||
| ) | ||
| ) | ||
| } | ||
|
|
||
| df$Effect_int <- Effect_int | ||
| drops <- c("exp_p", "Ftot", "extreme_p") | ||
| df <- df[ , !(names(df) %in% drops)] | ||
|
|
||
| return(df) | ||
| } |
There was a problem hiding this comment.
This function int_significance_BI is almost an exact copy of int_significance_BI_COMP, with the only difference being the call to pre_associndex_UNISITE_BI() instead of pre_associndex_UNISITE_BI_COMP(). This code duplication should be avoided to improve maintainability. You could merge them into a single function that takes an argument to determine which preprocessing function to use.
| mydata[which(is.na(mydata$Fro)), "Fro"]<-0 | ||
| mydata[which(is.na(mydata$Ao)), "Ao"]<-min(mydata$Ao, na.rm=T) |
There was a problem hiding this comment.
Using which() with logical indexing is redundant. You can directly use the logical vector for subsetting. This makes the code more concise and is more idiomatic in R. This pattern also appears in pre_associndex_UNISITE_BI_COMP at lines 1248-1249.
mydata[is.na(mydata$Fro), "Fro"] <- 0
mydata[is.na(mydata$Ao), "Ao"] <- min(mydata$Ao, na.rm = TRUE)
No description provided.