Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQA equations in compareToStateEnvelope() #3

Open
JimWhiting91 opened this issue Aug 17, 2024 · 3 comments
Open

EQA equations in compareToStateEnvelope() #3

JimWhiting91 opened this issue Aug 17, 2024 · 3 comments

Comments

@JimWhiting91
Copy link

Hello,

I am interested in applying the EQA framework to some data, however I have some questions regarding the equations, specifically this part of the compareToStateEnvelope()

D2E[i] <- sum(d_i^2)/length(d_i) - 0.5*var_env

The bit I am specifically stuck with is why we subtract 0.5 of var_env. I have worked through the paper here (https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecs2.4726) to try and understand why D2E is derived this way, however I am still not sure. The paper makes a specific statement that seems to contradict the equation in its current form, it states:

'u(y, E) = 0.5 whenever D(y, E)2 = Var(E) meaning that the squared distance between the assessed ecosystem and (the centroid of) the reference envelope is equal to the average, across reference states, of the same squared distance.'

This implies that D2E == the squared distance of an ecological state to the reference envelope centroid. However, we can put together a 2D reprex to show that this isn't the case. Following the equation for D2E, D2E != the squared distance to the reference/target envelope, it is always out by a factor of 0.5 x var_env, which implies we should be subtracting var_env rather than 0.5 x var_env. Using the current equation for D2E, we can see that an ecological state within the reference envelope (m3 in the reprex), gets a value for D2E that is greater than Var(E), in contradiction to the statement highlighted above.

A simpler way to think about it is to imagine y is an ecological state located at the reference envelope centroid. By definition, if D2E == the squared distance of y to reference centroid, D2E must be 0 when y is at the centroid. However the current equation for D2E returns a value of 0.5*var_env.

It is possible that I am missing something here, I couldn't actually find any reasoning in the paper for why the equation foe D2E is what it is, as it is not mentioned in the citation included for Legendre & De Cáceres 2013 (as far as I could tell). There therefore may be a good reason for subtracting 0.5*var_env instead of var_env. However, given the written statements for what D2E represents and the goals of the general framework, the current D2E calculation seems incorrect. The consequence for this is that the current equations overestimate the distance of ecological conditions to the reference envelope, and so they receive lower Q values. Please see the reprex below, happy to chat more about this and I apologise in advance if there is a good reason for the current formulation.

library(ggplot2)
library(data.table)

# Calculate some dummy vars 
dummy = data.table(point = c('a','b','c','d','centroid','m1','m2','m3'),                    
                   x = c(-10,10,10,-10,0,-5,20,9),                    
                   y = c(10,10,-10,-10,0,0,10,9),                    
                   type = c('reference','reference','reference','reference','ref_centroid','management','management','management')) 
# plot the example 
ggplot(dummy,aes(x,y,colour = type,shape = type)) +   
  geom_point(size = 5) 
# In this example, you have to imagine the envelope is a circle 
# that passes through the reference points 
# m1 is within the envelope, m2 is outside 
# Sum of squares for targets 
SS = apply(dummy[1:4,2:3],2,function(x) sum((x - mean(x))^2)) 
sum(SS)
sum(SS)/3

# Make a matrix of just the x,y
dummy_mat = as.matrix(dummy[,2:3]) 
rownames(dummy_mat) = dummy$point
# Calculate the squared distance between all points
dummy_sq_dist = as.matrix(dist(dummy_mat) ^2) 
diag(dummy_sq_dist) = NA 
dummy_sq_dist

# Just look at this dummy_sq_dist 
# From this matrix, we can take Var(E) as the mean squared distance of each target 
# to the centroid
# In practise its not exactly this because we divide by n-1, not n
varE = mean(dummy_sq_dist['centroid',c('a','b','c','d')]) 
# Assume we are working with m1 
# We first calculate the sum of squared distances between m1 and abcd 
mean(dummy_sq_dist['m1',c('a','b','c','d')])
# this gives 225, but to get a measure of d in terms of the squared distance to the target 
# envelope we need to subtract the entire squared distance of each of abcd to the abcd centroid 
# which is 200. 
# this gives us 25, which == the squared distance of m1 to the target centroid 
mean(dummy_sq_dist['m1',c('a','b','c','d')]) - varE
dummy_sq_dist['m1','centroid']
# The EQA equations would have us calculate as, which is out by 0.5 * varE
mean(dummy_sq_dist['m1',c('a','b','c','d')]) - (0.5 * varE) 

# Same for m2 and m3
mean(dummy_sq_dist['m2',c('a','b','c','d')]) - varE # 500
dummy_sq_dist['m2','centroid'] # 500
mean(dummy_sq_dist['m2',c('a','b','c','d')]) - 0.5 * varE # 600

# m3 is inside the envelope, but gets a D2E value greater than varE
mean(dummy_sq_dist['m3',c('a','b','c','d')]) - varE # 162
dummy_sq_dist['m3','centroid'] # 162
mean(dummy_sq_dist['m3',c('a','b','c','d')]) - 0.5 * varE # 262

# We can also calculate D2E for the centroid, which should be 0
mean(dummy_sq_dist['centroid',c('a','b','c','d')]) - varE
# EQA equation
mean(dummy_sq_dist['centroid',c('a','b','c','d')]) - (0.5 * varE)
@miquelcaceres
Copy link
Member

Dear Jim,

Thanks for your issue. You are totally right! There is a mistake in the paper and 1/2 should be removed from eqs. 3, 4 and 5. The mistake arised because, while eqs. 1 and 2 are correct in the paper, our implementation of VarE in ecotraj was doing the summation of all d_ij^2 elements whereas the equations indicates summation over the upper (or lower) triangle of the squared distance matrix. Thus, our estimation of VarE in the code was doubled and I divided by 2 in its application to estimate d2E. The net effect is that d2E is correctly estimated in ecotraj, but not VarE, and the equations 3, 4 and 5 should be corrected. We will have to send a correction.

I will double check all this, but I am quite sure that this is the problem.

Thanks again,
Miquel

@JimWhiting91
Copy link
Author

Hi Miquel,

Thanks for checking this, I'm glad I wasn't just misunderstanding something and appreciate your commitment to sending a correction over. I did suspect whether it was something to do with double-counting in the original distance matrix. That'd make complete sense and easy to miss.

Looking at the code in referenceEnvelopes(), it looks like var_env is being calculated using a dist matrix (enforced by the as.dist), so this should only be the lower half of the total matrix, here

  d_env <- as.dist(as.matrix(d)[sel_env, sel_env])
  var_env <- .stateEnvelopeVar(d_env)

Can show this in a quick example

d = matrix(rnorm(100),nrow = 10,ncol = 10)
length(as.vector(d)) # 100
length(as.vector(as.dist(d))) # 45

I think then when d is converted to a vector in stateEnvelopeVar, it should already be accounting for only counting the lower half of the matrix.

.stateEnvelopeVar <-function(d){
  r <- ncol(as.matrix(d))
  return(sum(as.vector(d)^2)/(r*(r-1)))
}

If this is the case, I think your estimates of var_env are correct, and so the 0.5 * var_env in compareToStateEnvelope() would need amending.

All the best,
Jim

@miquelcaceres
Copy link
Member

Hi Jim,

I corrected the code in functions compareToStateEnvelope() and compareToTrajectoryEnvelope(). I also added checks for the equality of the values of var_env and the average squared distance to centroid. The vignette illustrating the method now produces results that are slightly different from those published.

We will have to submit a corrigendum to the journal at some point, to inform about the mistakes in equations.

Best wishes,
Miquel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants