Skip to content

Conversation

@xyg123
Copy link

@xyg123 xyg123 commented Mar 27, 2025

this PR adds the step config for running the interval index needed to generate interval features for L2G.

It processes the interval files from their raw format, then identifies interval regions that overlaps our variant index.

@xyg123 xyg123 requested review from Copilot and project-defiant and removed request for Copilot March 27, 2025 13:43
Copy link
Collaborator

@project-defiant project-defiant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will require additional changes to the gentropy.yaml and pis.yaml

  • Ideally we want to move the interval raw sources to the $release_bucket/input/interval with PIS. Have a look at pis.yaml
  • After having raw intervals in the input/intervals we want to refer to them from the gentropy.yaml in simillar way how you defined it in the genetics_etl.yaml.

Note

genetics_etl.yaml is going to be superseded by gentropy.yaml once we define the development workflow for the orchestration and allow to run a subpart of the unified pipeline dag.

@project-defiant
Copy link
Collaborator

project-defiant commented Apr 1, 2025

@xyg123 can you post into the PR the link to the execution of the intervalStep from google cloud dtaproc job and other steps that depend on it?

@xyg123
Copy link
Author

xyg123 commented Apr 4, 2025

Successful run of interval step to generate interval index, took ~10 hours.

@project-defiant
Copy link
Collaborator

@xyg123 we need to make some strategic decisions here, because we can not have a step that takes ~10h to calculate during each release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants