Skip to content

Conversation

@pi-zz-a
Copy link
Contributor

@pi-zz-a pi-zz-a commented Dec 15, 2024

By ordering the DNA and RNA using the eids when creating the MPRA set, the original problem of the eid ordering is solved. This was previously fixed by aggregating the counts when calculating the logratio, also when aggregation=none. With the proposed change, it is now possible to calculate the logratios without aggregating the counts.

@lmyint
Copy link
Collaborator

lmyint commented Jan 9, 2025

The proposed changes won't work if there are multiple barcodes for each element. In the example below, there are 2 elements ("eid1" and "eid2") with 5 barcodes each. Doing dna[eid,] and rna[eid,] will duplicate rows instead of doing the intended sorting.

> mat <- matrix(1:30, nrow = 10, ncol = 3)
> eids <- rep(paste0("eid", 1:2), each = 5)
> rownames(mat) <- eids
> 
> eids
 [1] "eid1" "eid1" "eid1" "eid1" "eid1" "eid2" "eid2" "eid2" "eid2" "eid2"
> mat
     [,1] [,2] [,3]
eid1    1   11   21
eid1    2   12   22
eid1    3   13   23
eid1    4   14   24
eid1    5   15   25
eid2    6   16   26
eid2    7   17   27
eid2    8   18   28
eid2    9   19   29
eid2   10   20   30
> mat[eids,]
     [,1] [,2] [,3]
eid1    1   11   21
eid1    1   11   21
eid1    1   11   21
eid1    1   11   21
eid1    1   11   21
eid2    6   16   26
eid2    6   16   26
eid2    6   16   26
eid2    6   16   26
eid2    6   16   26

We included aggregation before log ratio computation even when aggregate=="none" because the only reason a user should pick aggregate=="none" is if there is only one barcode per EID or if the counts have already been aggregated across the multiple barcodes per EID. In these cases, aggregation doesn't do anything to the counts--it just sorts the count matrices by EID.

We don't have an option for computing and returning log ratios at the barcode level because the package is meant to facilitate differential analysis at the EID level. If a user wants to compute barcode level log ratios, they can just compute it with logr <- log2(rna + 1) - log2(dna + 1) from the rna and dna count matrices they supplied to the MPRASet() constructor.

@pi-zz-a
Copy link
Contributor Author

pi-zz-a commented Jan 13, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants