Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API integration for user-submitted differential expression calculations (SCP-5767) #2212

Merged
merged 8 commits into from
Mar 5, 2025

Conversation

bistline
Copy link
Contributor

@bistline bistline commented Mar 3, 2025

BACKGROUND

Currently, there are only two ways a study can have differential expression results: automatic calculation of one-vs-rest on cell-type or raw annotations, or if a study owner supplies their own results. While SCP has support for calculating pairwise results, we don't compute these automatically due to the potential for cost overruns. Additionally, we do not calculate one-vs-rest results for all annotations due to known limitations.

CHANGES

Now, there is an API endpoint that will accept submissions for new differential expression calculations. These results can be one-vs-rest or pairwise for a given cluster/annotation in a study. This endpoint is currently not integrated anywhere in the application, and is only available from the Swagger documentation. It will validate all incoming requests and respond as follows:

  • Valid job parameters submitted for calculation: 204 No content
  • User not signed in: 401 Unauthorized
  • User cannot view study or study has author-supplied DE results: 403 forbidden
  • Missing study/cluster/annotation: 404 Not found
  • DE parameters do not result in a valid job: 422 Unprocessable entity

MANUAL TESTING

  1. Boot all services and open Swagger documentation
  2. Go to the DE submission endpoint and authenticate with the lock icon
  3. Find the accession for your Human milk - differential expression study and submit a request for pairwise results:
{
  "cluster_name": "All Cells UMAP",
  "annotation_name": "General_Celltype",
  "annotation_scope": "study",
  "de_type": "pairwise",
  "group1": "LC1",
  "group2": "LC2"
}
  1. Look in development.log to see that a job was submitted to the Batch API
  2. Once that job completes, confirm you get a success email
  3. Go to that study in the Explore tab and confirm you can load pairwise DE results for LC1 & LC2, and that the first gene is ASCC3
  4. Try submitting another request for the same DE result and confirm you get 409 and the message Requested results already exist
  5. Try submitting requests for various invalid combinations (annotation/cluster doesn't exist, pairwise with labels not represented, etc) and confirm you get the appropriate errors & messages
  6. (Optional) If you have a study in your local instance with author-supplied DE results, try to submit any request in that study and confirm you get 403 Forbidden with a message User requests are disabled for this study as it contains author-supplied differential expression results

@bistline bistline requested review from eweitz and jlchang March 3, 2025 18:14
Copy link

codecov bot commented Mar 3, 2025

Codecov Report

Attention: Patch coverage is 97.56098% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.33%. Comparing base (0a4d04a) to head (a27746c).
Report is 12 commits behind head on development.

Files with missing lines Patch % Lines
app/controllers/api/v1/site_controller.rb 97.82% 2 Missing ⚠️
app/models/ingest_job.rb 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #2212      +/-   ##
===============================================
+ Coverage        70.21%   70.33%   +0.11%     
===============================================
  Files              333      332       -1     
  Lines            28437    28492      +55     
  Branches          2518     2518              
===============================================
+ Hits             19967    20039      +72     
+ Misses            8323     8306      -17     
  Partials           147      147              
Files with missing lines Coverage Δ
app/lib/differential_expression_service.rb 91.39% <100.00%> (+1.28%) ⬆️
app/models/admin_configuration.rb 46.15% <100.00%> (ø)
app/models/user.rb 73.03% <100.00%> (+0.13%) ⬆️
app/models/ingest_job.rb 55.74% <50.00%> (-0.02%) ⬇️
app/controllers/api/v1/site_controller.rb 79.32% <97.82%> (+2.18%) ⬆️

... and 11 files with indirect coverage changes

Copy link
Member

@eweitz eweitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good!

How do you foresee accounting for group order in pairwise DE requests? We know that pairwise DE for (A, B) is equal to pairwise DE for (B, A) via log2FC(A, B) = -1 * log2FC(B, A).

That'd halve compute and storage cost.

Perhaps the server here could naturally-order the groups in pairwise DE upon starting to process the post request. I don't think that's essential to do in this PR, but I'd advocate handling it before making the functionality GA.

@bistline
Copy link
Contributor Author

bistline commented Mar 4, 2025

Code looks good!

How do you foresee accounting for group order in pairwise DE requests? We know that pairwise DE for (A, B) is equal to pairwise DE for (B, A) via log2FC(A, B) = -1 * log2FC(B, A).

That'd halve compute and storage cost.

Perhaps the server here could naturally-order the groups in pairwise DE upon starting to process the post request. I don't think that's essential to do in this PR, but I'd advocate handling it before making the functionality GA.

Unless I'm mistaken, we had said that these parameters are order-dependent and we would not sort them. Group 1 is always first, and group 2 is always second. @jlchang can you confirm?

@bistline
Copy link
Contributor Author

bistline commented Mar 4, 2025

Code looks good!
How do you foresee accounting for group order in pairwise DE requests? We know that pairwise DE for (A, B) is equal to pairwise DE for (B, A) via log2FC(A, B) = -1 * log2FC(B, A).
That'd halve compute and storage cost.
Perhaps the server here could naturally-order the groups in pairwise DE upon starting to process the post request. I don't think that's essential to do in this PR, but I'd advocate handling it before making the functionality GA.

Unless I'm mistaken, we had said that these parameters are order-dependent and we would not sort them. Group 1 is always first, and group 2 is always second. @jlchang can you confirm?

Based off of our demo discussion, we will move forward with this plan. I've tracked it in SCP-5944 - it will need both client- and server-side updates, but they should be fairly minimal.

@bistline
Copy link
Contributor Author

bistline commented Mar 4, 2025

@eweitz @jlchang I've added user quotas in a915c5a if you'd like to check that out. You can test the rate limiting by updating the weekly_de_quota value on your user account directly and then attempting to submit a request. There's also now an admin config option for this to change it, and a global default value of 5.

@jlchang
Copy link
Contributor

jlchang commented Mar 4, 2025

@bistline If I submit either of the following DE jobs, I expected 422 but got 500:
{
"cluster_name": "All Cells UMAP",
"annotation_name": "General_Celltype",
"annotation_scope": "study",
"de_type": "pairwise",
"group1": "LC1",
"group2": "bar",
}
OR
{
"cluster_name": "All Cells UMAP",
"annotation_name": "foo",
"annotation_scope": "study",
"de_type": "pairwise",
"group1": "LC1",
}

Copy link
Contributor

@jlchang jlchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional review: job submitted and ran as described. Errors, including exceeding weekly quota mostly return as expected.

Approving unless the 500 (see prior comment) should have been a 422. If the particular incorrect de_job submissions are expected to return 500, please just indicate in comments what type of job parameter validation should have been tested.

@bistline
Copy link
Contributor Author

bistline commented Mar 4, 2025

@bistline If I submit either of the following DE jobs, I expected 422 but got 500: { "cluster_name": "All Cells UMAP", "annotation_name": "General_Celltype", "annotation_scope": "study", "de_type": "pairwise", "group1": "LC1", "group2": "bar", } OR { "cluster_name": "All Cells UMAP", "annotation_name": "foo", "annotation_scope": "study", "de_type": "pairwise", "group1": "LC1", }

Paired with Jean, and the issue is that the request is malformed JSON (extra trailing comma) and our middleware couldn't parse the request:

Completed 400 Bad Request in 463ms (MongoDB: 0.3ms | Allocations: 45618)

859: unexpected token at '{
  "cluster_name": "All Cells UMAP",
  "annotation_name": "General_Celltype",
  "annotation_scope": "study",
  "de_type": "pairwise",
  "group1": "eosinophils",
  "group2": "foo",
}' excluded from capture: DSN not set
  
ActionDispatch::Http::Parameters::ParseError (859: unexpected token at '{
  "cluster_name": "All Cells UMAP",
  "annotation_name": "General_Celltype",
  "annotation_scope": "study",
  "de_type": "pairwise",
  "group1": "eosinophils",
  "group2": "foo",
}'):

In production, this will actually manifest as a 400 Bad request which is correct, but because of weirdness in local development and our custom error handling, we always get that unhelpful 500.

@bistline bistline merged commit 12f666c into development Mar 5, 2025
5 checks passed
@github-actions github-actions bot deleted the jb-pairwise-de-api branch March 5, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants