Update flash_init.R #137

YushaLiu · 2025-01-21T04:24:10Z

A minor edit to allow flash_init to take a value for Y2 rather than have flashier compute it, which throws out the error CHOLMOD error 'problem too large' when the number of cells is too large (e.g., > 100k).

pcarbo · 2025-01-21T16:34:58Z

@YushaLiu I'm just summarizing the changes that @willwerscheid has requested:

Rename "Y2val" as "Y2".
Add roxygen2 documentation for the new ("Y2") argument.

I believe if you update your fork, the changes will automatically be reflected in this Pull Request.

pcarbo · 2025-01-22T16:05:44Z

@YushaLiu It isn't quite clear from your description what Y2 should be; maybe you can give a simple example in R code of how to set Y2 naively? Is Y2 supposed to be a matrix, vector, or scalar?

YushaLiu · 2025-01-22T18:33:58Z

@pcarbo I updated the description. Does it look clear now?

pcarbo · 2025-01-22T19:09:15Z

Looks okay to me now, thanks. @willwerscheid I'll let you take care of the merge when you are okay with Yusha's changes.

willwerscheid · 2025-01-22T19:23:06Z

thanks, looks much better. this part is not accurate though:

#' rather than having \code{flashier} compute \code{Y2}, which involves explicitly
#' forming the \eqn{n \times n} dense matrix \eqn{Y} and can be much slower and
#' even cause memory issues.

flashier does not do this - if a low-rank representation Y = XX' is used then Y2 is computed as sum(X * (X(X’X)) [edited] - see sq.nmode.prod.r1.lowrank() in data_lowrank.R. So I guess now I am wondering whether we actually need this change? This calculation should not be too different from what you suggest doing in the example (it is done this way in case Y = X_1 X_2' with X_1 \ne X_2). But I did notice that we might be using base crossprod instead of Matrix::crossprod... maybe that is causing your issues? Or maybe we need to handle the special case where X_1 = X_2? What do you think @pcarbo @YushaLiu ? I would like to avoid adding an argument to flash_init() if we can instead automatically calculate Y2 more efficiently.

pcarbo · 2025-01-22T19:46:52Z

I agree with you @willwerscheid, especially if it is an easy fix!

willwerscheid · 2025-01-23T19:18:45Z

The issue is that X(X'X) is not in fact sparse so the current method for computing Y2 is indeed very inefficient. We should handle the case where X_1 = X_2 separately (since it can then be computed very efficiently); we should also see whether it is possible to do the general case and still preserve sparsity.

pcarbo · 2025-01-23T19:30:27Z

Also, should you be using Matrix::crossprod in this code in data_lowrank.R?

if (n == 1)
  return(Matrix::rowSums(lr[[1]] * (x[[1]] %*% crossprod(x[[2]], u))))
if (n == 2)
  return(Matrix::rowSums(lr[[2]] * (x[[2]] %*% t(crossprod(u, x[[1]])))))

willwerscheid · 2025-01-23T19:32:33Z

Matrix::crossprod is in the NAMESPACE already; I think we can actually remove Matrix:: from the call to rowSums()

pcarbo · 2025-01-23T19:34:07Z

Hm, not sure, but you may have to be explicit since base::crossprod is also (implicitly) in the NAMESPACE. (I think.)

pcarbo · 2025-01-23T19:37:01Z

The issue is that X(X'X) is not in fact sparse.

Well, X'X is typically not sparse (i.e., has many nonzeros).

Just to be absolutely clear: crossprod(X) returns a sparse matrix (i.e., sparse matrix data structure) when X is a sparse matrix, but the sparse matrix output will typically have many nonzeros.

pcarbo · 2025-01-23T19:38:17Z

The term "sparse matrix" is ambiguous here because it could refer to a sparse matrix object (e.g., "dgCMatrix" class), or a matrix that has a high proportion of nonzeros. And I think you are referring to both at different times. (So am I.)

YushaLiu · 2025-01-23T21:14:13Z

I didn't follow all the conversations above, but I'm curious if there is a way to calculate Y2 more efficiently than what I did in the pull request, if Y=XX'? @willwerscheid

Also, I personally feel that calculating Y2 is most useful in the case that Y=XX', where Y is closely related to covariance matrix (as in GBCD) -- columns and rows of Y are symmetric here so it is reasonable to assume var_type = 0. If Y = X_1 X_2' with X_1 \ne X_2, I'm not sure if var_type = 0 is still reasonable; if we need to assumevar_type = 1 or 2, then calculating Y2 will not be enough for updating the residual variance parameters in flash?

Update flash_init.R

7fd3390

YushaLiu added 4 commits January 22, 2025 00:12

Document Y2.

737545d

Change Y2_val to Y2 and document Y2

433664a

Change Y2_val to Y2 and document Y2

14e8b64

Change Y2_val to Y2 and document Y2

22dce39

YushaLiu added 2 commits January 22, 2025 13:29

Update flash_init.R

27641d9

Update flash_init.R

b966fdc

Update flash_init.R #137

Are you sure you want to change the base?

Update flash_init.R #137

Uh oh!

Conversation

YushaLiu commented Jan 21, 2025

Uh oh!

pcarbo commented Jan 21, 2025

Uh oh!

pcarbo commented Jan 22, 2025

Uh oh!

YushaLiu commented Jan 22, 2025

Uh oh!

pcarbo commented Jan 22, 2025

Uh oh!

willwerscheid commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcarbo commented Jan 22, 2025

Uh oh!

willwerscheid commented Jan 23, 2025

Uh oh!

pcarbo commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willwerscheid commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcarbo commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcarbo commented Jan 23, 2025

Uh oh!

pcarbo commented Jan 23, 2025

Uh oh!

YushaLiu commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

willwerscheid commented Jan 22, 2025 •

edited

Loading

pcarbo commented Jan 23, 2025 •

edited

Loading

willwerscheid commented Jan 23, 2025 •

edited

Loading

pcarbo commented Jan 23, 2025 •

edited

Loading