Add the rest of functions and tests#35
Conversation
Summary of ChangesHello @jmalcan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the R package's capabilities for analyzing ecological interaction networks. It introduces a broad range of functions designed to process, analyze, and visualize recruitment, facilitation, and competition data, providing researchers with advanced tools to understand network structure, interaction significance, and species roles within these complex systems. The additions aim to streamline workflows and offer deeper insights into ecological dynamics. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds a substantial amount of new functionality and tests. The code is generally functional, but there are several areas for improvement, primarily concerning code duplication and maintainability.
- Massive Code Duplication: Several pairs of files are nearly identical (e.g.,
topol_fac.R/topol_depre.R,visu_topol_fac.R/visu_topol_depre.R). Many functions also contain large blocks of duplicated code (e.g.,cum_values.R,visu_net.R). This makes the codebase difficult to maintain. These should be refactored into single, parameterized functions. - Maintainability: There are many instances of
if/if/ifcontrol structures that should beif/else if. The use ofrequire()inside functions is discouraged in packages. Complex nestedifelsestatements should be rewritten as standardif/elseblocks for clarity. - Testing: The tests rely on downloading data from the internet (
download_RN()) and change the working directory (setwd()). This makes the tests fragile, slow, and not self-contained. Test data should be included with the package.
I've identified several specific instances of these issues in the comments below. Addressing them will significantly improve the quality and long-term maintainability of the code.
| #' @examples | ||
| #' | ||
| #' | ||
| topol_fac <- function(int_data,cover_data, direction=c("in","out")){ |
There was a problem hiding this comment.
This file is almost an exact copy of R/topol_depre.R. The only significant difference is the int_type passed to RN_to_matrix on line 18. This massive code duplication makes the code base very difficult to maintain, as any bug fix or improvement would need to be applied in two places.
Please merge topol_fac and topol_depre into a single function that accepts an int_type parameter ("fac" or "comp"). This will eliminate over 300 lines of duplicated code.
| #' @examples | ||
| #' | ||
| #' | ||
| visu_topol_fac <- function(int_data, |
There was a problem hiding this comment.
This file is a near-duplicate of R/visu_topol_depre.R. The main differences are the int_type passed to RN_to_matrix and the plot title. This duplication makes maintenance difficult.
Please merge visu_topol_fac and visu_topol_depre into a single function that accepts an int_type parameter to specify whether to visualize the 'facilitation' or 'depression' network. The plot title can be set dynamically based on this parameter.
| if(property=="vcount"){ | ||
|
|
||
| part_RNs <- partial_RNs_UNI(int_data, k) | ||
| nSteps <- length(part_RNs) | ||
| borrar <- unlist(part_RNs, recursive = FALSE) | ||
| df <- data.frame(unlist(lapply(borrar, property))) | ||
| colnames(df) <- c("Value") | ||
| df$sampleSize <- sort(rep(c(1:nSteps),k)) | ||
| plot_cumm_value <- ggplot(df, aes(x=as.factor(sampleSize), y=Value)) + | ||
| geom_jitter(colour="turquoise3", alpha=0.5, height = 0, width=0.1) + | ||
| geom_point(stat="summary", fun="mean") + | ||
| geom_errorbar(stat="summary", fun.data="mean_se", fun.args = list(mult = 1.96), width=0.3) + | ||
| labs(x="Sample Size (Num. Plots)", y="Value (mean + 95%CI)") + | ||
| ggtitle("Number of species") | ||
|
|
||
| outputs <- list("Data" = df, "Plot" = plot_cumm_value) | ||
| return(outputs) | ||
|
|
||
| } | ||
|
|
||
| if(property=="ecount"){ | ||
|
|
||
| part_RNs <- partial_RNs_UNI(int_data, k) | ||
| nSteps <- length(part_RNs) | ||
| borrar <- unlist(part_RNs, recursive = FALSE) | ||
| df <- data.frame(unlist(lapply(borrar, property))) | ||
| colnames(df) <- c("Value") | ||
| df$sampleSize <- sort(rep(c(1:nSteps),k)) | ||
| plot_cumm_value <- ggplot(df, aes(x=as.factor(sampleSize), y=Value)) + | ||
| geom_jitter(colour="turquoise3", alpha=0.5, height = 0, width=0.1) + | ||
| geom_point(stat="summary", fun="mean") + | ||
| geom_errorbar(stat="summary", fun.data="mean_se", fun.args = list(mult = 1.96), width=0.3) + | ||
| labs(x="Sample Size (Num. Plots)", y="Value (mean + 95%CI)") + | ||
| ggtitle("Number of interactions") | ||
|
|
||
| outputs <- list("Data" = df, "Plot" = plot_cumm_value) | ||
| return(outputs) | ||
|
|
||
| } | ||
|
|
||
| if(property=="edge_density"){ | ||
|
|
||
| part_RNs <- partial_RNs_UNI(int_data, k) | ||
| nSteps <- length(part_RNs) | ||
| borrar <- unlist(part_RNs, recursive = FALSE) | ||
| df <- data.frame(unlist(lapply(borrar, property))) | ||
| colnames(df) <- c("Value") | ||
| df$sampleSize <- sort(rep(c(1:nSteps),k)) | ||
| plot_cumm_value <- ggplot(df, aes(x=as.factor(sampleSize), y=Value)) + | ||
| geom_jitter(colour="turquoise3", alpha=0.5, height = 0, width=0.1) + | ||
| geom_point(stat="summary", fun="mean") + | ||
| geom_errorbar(stat="summary", fun.data="mean_se", fun.args = list(mult = 1.96), width=0.3) + | ||
| labs(x="Sample Size (Num. Plots)", y="Value (mean + 95%CI)") + | ||
| ggtitle("Connectance") | ||
|
|
||
| outputs <- list("Data" = df, "Plot" = plot_cumm_value) | ||
| return(outputs) | ||
|
|
||
| } |
There was a problem hiding this comment.
There is a large amount of code duplication across the three if blocks for each property. The data processing and ggplot creation logic are nearly identical, with only the ggtitle changing. This should be refactored into a single block of code to improve maintainability. You can use a named vector or a switch statement to select the appropriate title based on the property.
property <- match.arg(property)
part_RNs <- partial_RNs_UNI(int_data, k)
nSteps <- length(part_RNs)
borrar <- unlist(part_RNs, recursive = FALSE)
df <- data.frame(Value = unlist(lapply(borrar, property)))
df$sampleSize <- sort(rep(c(1:nSteps),k))
plot_titles <- c(
vcount = "Number of species",
ecount = "Number of interactions",
edge_density = "Connectance"
)
plot_cumm_value <- ggplot(df, aes(x=as.factor(sampleSize), y=Value)) +
geom_jitter(colour="turquoise3", alpha=0.5, height = 0, width=0.1) +
geom_point(stat="summary", fun="mean") +
geom_errorbar(stat="summary", fun.data="mean_se", fun.args = list(mult = 1.96), width=0.3) +
labs(x="Sample Size (Num. Plots)", y="Value (mean + 95%CI)") +
ggtitle(plot_titles[property])
outputs <- list("Data" = df, "Plot" = plot_cumm_value)
return(outputs)
| Significance <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse(((df$Fc[i]+df$Fro[i])*(df$Ac[i]/(df$Ac[i]+df$Ao[i]))<=5 | (df$Fc[i]+df$Fro[i])*(df$Ao[i]/(df$Ac[i]+df$Ao[i]))<=5), | ||
| Significance[i] <- binom.test(df$Fc[i], df$Fc[i]+df$Fro[i], df$exp_p[i], alternative ="two.sided")$p.value, | ||
| Significance[i] <- chisq.test(c(df$Fc[i], df$Fro[i]), p = c(df$exp_p[i], 1-df$exp_p[i]))$p.value | ||
| ) | ||
| } | ||
| df$Significance <- Significance | ||
|
|
||
| Test_type <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse(((df$Fc[i]+df$Fro[i])*(df$Ac[i]/(df$Ac[i]+df$Ao[i]))<=5 | (df$Fc[i]+df$Fro[i])*(df$Ao[i]/(df$Ac[i]+df$Ao[i]))<=5), | ||
| Test_type[i] <- "Binomial", | ||
| Test_type[i] <- "Chi-square" | ||
| ) | ||
| } | ||
| df$Test_type <- Test_type | ||
|
|
||
| if(length(unique(df$Test_type))>1) message("Different tests were used for different canopy-recruit pairs. Check column Test_type") | ||
|
|
||
| Effect_int <- c() | ||
| for(i in 1:n_tests) { | ||
| ifelse((df$testability[i]>0.05), | ||
| Effect_int[i] <- "Not testable", | ||
| ifelse(df$Significance[i] > 0.05, | ||
| Effect_int[i] <- "Neutral", | ||
| ifelse((df$Fc[i]/df$Ac[i])>(df$Fro[i]/df$Ao[i]), | ||
| Effect_int[i] <- "Facilitative", | ||
| Effect_int[i] <- "Depressive") | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
The nested ifelse statements, especially the one for Effect_int, are very difficult to read and maintain. Using ifelse for assignments within a loop like this is also unconventional and can be error-prone. It's better to use standard if/else blocks for clarity and correctness.
Additionally, the condition for choosing between binom.test and chisq.test is duplicated. This condition should be calculated once per loop iteration and stored in a variable.
Significance <- numeric(n_tests)
Test_type <- character(n_tests)
Effect_int <- character(n_tests)
for(i in 1:n_tests) {
is_small_sample <- ((df$Fc[i]+df$Fro[i])*(df$Ac[i]/(df$Ac[i]+df$Ao[i]))<=5 | (df$Fc[i]+df$Fro[i])*(df$Ao[i]/(df$Ac[i]+df$Ao[i]))<=5)
if (is_small_sample) {
Significance[i] <- binom.test(df$Fc[i], df$Fc[i]+df$Fro[i], df$exp_p[i], alternative ="two.sided")$p.value
Test_type[i] <- "Binomial"
} else {
Significance[i] <- chisq.test(c(df$Fc[i], df$Fro[i]), p = c(df$exp_p[i], 1-df$exp_p[i]))$p.value
Test_type[i] <- "Chi-square"
}
if (df$testability[i] > 0.05) {
Effect_int[i] <- "Not testable"
} else if (df$Significance[i] > 0.05) {
Effect_int[i] <- "Neutral"
} else if ((df$Fc[i]/df$Ac[i]) > (df$Fro[i]/df$Ao[i])) {
Effect_int[i] <- "Facilitative"
} else {
Effect_int[i] <- "Depressive"
}
}
df$Significance <- Significance
df$Test_type <- Test_type
| plot(RN_igraph, | ||
| edge.arrow.size=.3, | ||
| edge.width = E(RN_igraph)$weight*scale, | ||
| vertex.color="chartreuse", | ||
| vertex.size=8, | ||
| vertex.frame.color="darkolivegreen", | ||
| vertex.label.color="black", | ||
| vertex.label.cex=0.8, | ||
| vertex.label.dist=2, | ||
| vertex.label.font = 3, | ||
| edge.curved=0.2, | ||
| #layout=layout_with_kk(RN_igraph), | ||
| layout=layout_in_circle(RN_igraph), | ||
| frame = TRUE) |
There was a problem hiding this comment.
The plot() call with its many arguments is duplicated across three different if blocks in this function. This makes the code harder to maintain, as any change to the plotting style needs to be replicated in multiple places.
Consider extracting the plotting logic into a helper function to avoid this repetition.
| res <- funtopol_rec(mysite_com, mysite_cov) | ||
| fc <- res$Functional_classification | ||
|
|
||
| a<-sort(a$Functional_classification$id) |
| #The size of a network can be described by the number of nodes (*N*) and links (*L*), and its complexity is #proportional to network connectance. *C* can be estimated on different ways depending on the type of #network, but it always measures the proportion of links relative to the maximum number of links that could #be possible in the network. In the case of recruitment networks we use the formula $C = L /(N^2-N)$ since #the node "Open" does not act as a recruit (i.e. Open is represented by a row of zeroes in the adjacency #matrix). | ||
| #For facilitation and competition that are bipartite networks, the nodes in eahc guild are #provided( nurse/facilitated, Canopy (depressor)/recruit) seprately, and open is not a node | ||
| #' | ||
| RN_dims <- function(int_data,cover_data, int_type=c("rec","fac","comp")){ |
There was a problem hiding this comment.
For better code structure and robustness, it's recommended to validate the int_type argument using match.arg() at the beginning of the function. Additionally, the series of if statements should be converted to an if/else if chain. This is more efficient as it avoids unnecessary checks once a condition is met, and it more clearly expresses the mutually exclusive nature of the code blocks.
RN_dims <- function(int_data,cover_data, int_type=c("rec","fac","comp")){
int_type <- match.arg(int_type)
| #' @examples | ||
| #' | ||
| #' | ||
| RN_heatmap <- function(int_data,cover_data,int_type=c("rec","fac","comp"), weight = c("Pcr","Fcr","Dcr","Dro","Ns", "NintC", "NintA", "RII"), scale_top = 1) { |
There was a problem hiding this comment.
The function parameter scale_top is defined but never used within the function body. This constitutes dead code and should be removed to improve clarity.
Additionally, using require(ggplot2) inside a function is not recommended for package development. It's better to list ggplot2 in the Imports field of your DESCRIPTION file and then either use ggplot2:: to call functions or use @importFrom roxygen tags.
RN_heatmap <- function(int_data,cover_data,int_type=c("rec","fac","comp"), weight = c("Pcr","Fcr","Dcr","Dro","Ns", "NintC", "NintA", "RII")) {| filter_paths <- function(paths) { | ||
|
|
||
| lens <- sapply(paths, length) | ||
| ord <- order(lens, decreasing = TRUE) | ||
| paths <- paths[ord] | ||
|
|
||
| keep <- rep(TRUE, length(paths)) | ||
|
|
||
| for (i in seq_along(paths)) { | ||
|
|
||
| if (!keep[i]) next | ||
|
|
||
| # compare with paths already aceptaded (& longer) | ||
| if (i > 1) { | ||
| for (j in which(keep)[which(keep) < i]) { | ||
| if (es_subpath(paths[[i]], paths[[j]])) { | ||
| keep[i] <- FALSE | ||
| break | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| paths[keep] | ||
| } |
| SCCs <- igraph::components(g, mode = "strong") | ||
|
|
||
|
|
||
| core<-funtopol_UNI(int_data)$Descriptors[which(rownames(funtopol_UNI(int_data)$Descriptors)=="Num. core species"),] |
There was a problem hiding this comment.
The function funtopol_UNI() is called twice (here and on line 25). This is inefficient, as the computation is performed twice. The result of the first call should be stored in a variable and reused.
funtopol_results <- funtopol_UNI(int_data)
nodes_list <- funtopol_results$Functional_classification
int_data_plot<-int_significance(int_data0,cover_data0, int_type=c("rec"))
int_data_plot <- int_data_plot[, c("Canopy", "Recruit", setdiff(names(int_data_plot), c("Canopy","Recruit")))]
g <- igraph::graph_from_data_frame(int_data_plot, directed = TRUE)
g <- igraph::simplify(g, remove.multiple = TRUE, remove.loops = FALSE)
SCCs <- igraph::components(g, mode = "strong")
core<-funtopol_results$Descriptors[which(rownames(funtopol_results$Descriptors)=="Num. core species"),]
No description provided.