Skip to content

New dep files#1195

Open
tech3371 wants to merge 6 commits intoIMAP-Science-Operations-Center:devfrom
tech3371:new_dep_files
Open

New dep files#1195
tech3371 wants to merge 6 commits intoIMAP-Science-Operations-Center:devfrom
tech3371:new_dep_files

Conversation

@tech3371
Copy link
Copy Markdown
Contributor

Change Summary

closes #1153

Overview

This are changes that we are building up slowing into dependency_refactor branch. These won't be implemented and push to production yet. It's incremental dependency refactoring changes.

File changes

Added new dependencies files for individual instrument team. Each file contains data products and their upstream dependency. It is closely related to documentation on this PR. Both are dependent on each other and build upon each other. Once this PR is in good shape and ready to merge, I plan to go back and update that documentation PR based on outcome from this PR.

In this PR, main changes are separating dependencies for each instrument into their own files and then format in new design. Then added code that reads and validate dependency node information.

Testing

@tech3371 tech3371 self-assigned this Mar 25, 2026
@tech3371 tech3371 added the enhancement New feature or request label Mar 25, 2026
@tech3371 tech3371 changed the base branch from dependency_refactor to dev March 25, 2026 21:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the incremental dependency refactor by splitting pipeline dependency configurations into per-instrument YAML files and adding code/tests to load and validate the new format. It also includes related changes to dependency resolution behavior (Hi Goodtimes multi-repoint support, non-daily coverage handling), SPICE metakernel time conversion, and API-key scope enforcement.

Changes:

  • Add per-instrument dependency YAML files and a new DependencyConfigReader to parse/validate them.
  • Extend dependency resolution logic (Hi Goodtimes N-nearest repoints, non-daily instrument coverage skip, helper queries + tests).
  • Update API-key authorization/scope behavior and SPICE metakernel utilities/tests; bump imap-data-access / imap-processing versions.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/lambda_endpoints/test_upload_api.py Adds authorizer context to upload tests; adds 403 tests for read/empty scope.
tests/lambda_endpoints/test_spice_metakernel_api.py Unskips/expands metakernel tests; uploads kernels to mock S3 for time conversion coverage.
tests/lambda_endpoints/test_scheduled_job.py Updates scheduled job tests to use shared upload mocking fixture.
tests/lambda_endpoints/test_dependency_new.py New unit tests for the new dependency YAML reader/validator.
tests/lambda_endpoints/test_dependency_api.py Adds extensive test coverage for repoint/list handling, nearest-neighbor selection, and INPROGRESS skip logic.
tests/lambda_endpoints/test_batch_starter.py Updates batch starter tests and adds Hi Goodtimes multi-repoint trigger + IDEX buffered query tests.
tests/lambda_endpoints/conftest.py Adds mock_upload_request_success fixture to isolate tests from real upload/HTTP calls.
tests/infrastructure/test_api_key_dynamodb.py Updates scope semantics and adds write/upload restriction tests for read-only keys.
sds_data_manager/lambda_code/processing/requirements.txt Bumps imap-data-access and imap-processing versions for processing lambda package set.
sds_data_manager/lambda_code/authorization/manage_api_keys.py Adds scope validation + improved CLI usage/docs for full vs read.
sds_data_manager/lambda_code/authorization/lambda_api_key_authorizer.py Introduces scope/method/path checks and structured logging for authorization decisions.
sds_data_manager/lambda_code/SDSCode/spice_utilities.py New shared SPICE helpers to download kernels from S3 and furnish “best” kernel.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/spice_indexer.py Refactors to reuse shared SPICE utility functions.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py New reader/validator for per-instrument dependency YAML format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_config.csv Updates legacy CSV dependency mappings (notably Hi Goodtimes and some SPICE deps).
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency.py Adds non-daily handling, session reuse, multi-repoint dependencies, nearest-neighbor helpers, and Hi Goodtimes extensions.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_swe_dependencies.yaml New SWE dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_swapi_dependencies.yaml New SWAPI dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_spacecraft_dependencies.yaml New Spacecraft dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_mag_dependencies.yaml New MAG dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_lo_dependencies.yaml New LO dependency YAML in new format (includes anchors/aliases).
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_idex_dependencies.yaml New IDEX dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hit_dependencies.yaml New HIT dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml New HI dependency YAML in new format (includes Goodtimes date_range).
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_glows_dependencies.yaml New GLOWS dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_codice_dependencies.yaml New CoDICE dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/batch_starter.py Adjusts dependency config usage; adds IDEX buffered query start; implements Hi Goodtimes multi-repoint triggering; updates upload API invocation event.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/init.py Adds NON_DAILY_INSTRUMENTS list.
sds_data_manager/lambda_code/SDSCode/api_lambdas/upload_api.py Adds scope enforcement (deny read/missing scope) and logging for upload requests.
sds_data_manager/lambda_code/SDSCode/api_lambdas/spice_metakernel_api.py Adds leapsecond kernel furnishing path to enable human-readable date inputs.
sds_data_manager/lambda_code/SDSCode/api_lambdas/metakernel.py Refactors metakernel file selection logic and internal gap tracking structures.
sds_data_manager/lambda_code/IAlirtCode/ialirt_realtime.py Sanitizes non-finite floats in JSON output before uploading to S3.
sds_data_manager/lambda_code/IAlirtCode/ialirt_data_query_api.py Updates scope allowlist to include read and adjusts filtering behavior.
sds_data_manager/lambda_code/IAlirtCode/ialirt_coverage.py Adds UKSA schedule ingestion and passes it into coverage generation; adjusts DSN sorting.
sds_data_manager/constructs/sds_api_manager_construct.py Updates SPICE metakernel lambda env/memory/timeout; adds S3 read policy and IMAP_DATA_DIR/S3_BUCKET env vars.
sds_data_manager/constructs/processing_construct.py Switches processing compute environment from spot to on-demand.
sds_data_manager/constructs/ialirt_processing_construct.py Expands partner access ports for LASP/UKSA.
sds_data_manager/constructs/ialirt_api_manager_construct.py Renames/re-wires IALiRT query endpoints and adds reserved concurrency; adjusts which routes are auth vs restricted.
pyproject.toml Bumps imap-processing version.
poetry.lock Updates lockfile for bumped dependency versions / Poetry version.
lambda_layer/spice/requirements.txt Bumps imap-data-access in SPICE lambda layer.
lambda_layer/database/requirements.txt Bumps imap-data-access in DB lambda layer.
README.md Documents scope options (full, read) and updates API key management examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +139 to +188
def validate_node(self, node: list) -> bool:
"""Validate a dependency node.

A valid node must have exactly 5 elements or 6 elements:
(
source,
data_type,
descriptor,
required,
kickoff_job,
Optional(past, future)
)
If it includes past/future date ranges, it should follow the following format:
- p - pointing
- h - hourly
- d - days
- l - last_processed
- n - nearest
past and future should end with one of these options. Eg.
("-3p", "3pm") means 3 pointing
("-3d", "5d") means 5 days
("-2h", "2h") means 2 hours
("1l",) means last processed

Validation is performed for each field.

Parameters
----------
node : DependencyNode
Node to validate.

Returns
-------
bool
True if node is valid.

Raises
------
ValueError
If node format is invalid or contains invalid values.

Examples
--------
>>> config = DependencyConfig()
>>> config.validate_node(('codice', 'l1a', 'all'))
True
>>> config.validate_node(('invalid', 'l1a', 'all'))
Traceback (most recent call last):
...
ValueError: Invalid data source...
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_node is now validating dict-shaped nodes (with upstream_* keys), but the signature/type hint and docstring still describe a tuple/list-based “5 or 6 element” format. Updating the type annotation and docstring will prevent confusion and make the public contract clear.

Suggested change
def validate_node(self, node: list) -> bool:
"""Validate a dependency node.
A valid node must have exactly 5 elements or 6 elements:
(
source,
data_type,
descriptor,
required,
kickoff_job,
Optional(past, future)
)
If it includes past/future date ranges, it should follow the following format:
- p - pointing
- h - hourly
- d - days
- l - last_processed
- n - nearest
past and future should end with one of these options. Eg.
("-3p", "3pm") means 3 pointing
("-3d", "5d") means 5 days
("-2h", "2h") means 2 hours
("1l",) means last processed
Validation is performed for each field.
Parameters
----------
node : DependencyNode
Node to validate.
Returns
-------
bool
True if node is valid.
Raises
------
ValueError
If node format is invalid or contains invalid values.
Examples
--------
>>> config = DependencyConfig()
>>> config.validate_node(('codice', 'l1a', 'all'))
True
>>> config.validate_node(('invalid', 'l1a', 'all'))
Traceback (most recent call last):
...
ValueError: Invalid data source...
def validate_node(self, node: dict) -> bool:
"""Validate a dependency node.
A valid node is a mapping describing a single upstream dependency with
the following fields:
- ``upstream_source`` (str)
- ``upstream_data_type`` (str)
- ``upstream_descriptor`` (str)
- ``required`` (bool)
- ``kickoff_job`` (bool)
- optional ``date_range`` describing the past/future window
The optional date range, when present, controls how far in the past and
future to look for matching data. It uses a compact string format, where
the unit is encoded in the suffix:
- ``p`` - pointing
- ``h`` - hourly
- ``d`` - days
- ``l`` - last_processed
- ``n`` - nearest
Past and future values must end with one of these unit specifiers. For
example:
- ``("-3p", "3p")`` means 3 pointings in the past and 3 in the future
- ``("-3d", "5d")`` means from 3 days in the past to 5 days in the future
- ``("-2h", "2h")`` means from 2 hours in the past to 2 hours in the future
- ``("1l",)`` means the last processed value
Validation is performed for each field via the internal helpers on this
class.
Parameters
----------
node : dict
Dependency node to validate.
Returns
-------
bool
True if the node is valid.
Raises
------
ValueError
If the node format is invalid or contains invalid values.
Examples
--------
>>> config = DependencyConfigReader()
>>> node = {
... "upstream_source": "codice",
... "upstream_data_type": "l1a",
... "upstream_descriptor": "all",
... "required": True,
... "kickoff_job": False,
... }
>>> config.validate_node(node)
True

Copilot uses AI. Check for mistakes.
@tech3371
Copy link
Copy Markdown
Contributor Author

I will switch base to dependency_refactor at merge time or something.

- upstream_source: hi
upstream_data_type: l1b
upstream_descriptor: 45sensor-de
date_range: ["-3p", "3p"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look here on how to use optional date range field.

- upstream_source: planetary_ephemeris
upstream_data_type: spice
upstream_descriptor: historical
kickoff_job: false
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kickoff_job is true by default. If not true, then we need to overwrite like this. Commenting on this as an example and information to look out for.

This helps with removing lot of duplicate lines.

Key and value were added to make reading in these information in the code easier and reduce human error of putting information in this file. List can cause issue if information is put at wrong index but key and value pairs reduces that risks. That's why I went with this approach.

@tech3371
Copy link
Copy Markdown
Contributor Author

@pleasant-menlo I am tagging you on these PR for awareness purposes only. I don't expect our current CLI inputs or how dependency information is passed to change. This is for internal SDC backend improvement and features we need to add to support new cases that has come up in L1 and L2.

You can expect L3 inputs to be same as what it has been. So no worries there :). But since how we store dependency information is changing, I am tagging you for awareness since you have been very helpful there for L3.

Copy link
Copy Markdown
Contributor

@subagonsouth subagonsouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some feedback... some of it is very nit picky and likely to be somewhat opinionated.

kickoff_job: false


spice_45sensor_l1b: &spice_45sensor_l1b
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spice_45sensor_l1b and spice_90sensor_l1b are identical and should be reduced to one definition of spice_l1b_de as they are specific to just the DE products.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a nit pick, but I think that ordering of the definitions is important in these files for readability. I would suggest:

  • common definitions shared across many/all products. e.g. spice_basic.
  • Sub-sections ordered by level then type. In each sub-section order is:
    • common definitions for sub-section. e.g. l1b_de_spice, l2_spice
    • individual sensor definitions. e.g. l2_45sensor_ancillary
    • individual product definitions. e.g. (l2, h45-ena-h-hf-nsp-anti-hae-4deg-6mo)

I like how in the current CSV, it is easy to see all dependencies for a single product without scrolling all over the file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks good to me.

# Anchor keys don't have parentheses, so skip them.
if key_str.startswith("#") or not key_str.startswith("("):
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to skip comments. They are standard in YAML and will automatically get ignored by the reader.

Suggested change
# Skip any anchor definitions (common dependency groups).
# Product keys have format: (data_type, descriptor)
# Anchor keys don't have parentheses, so skip them.
if not key_str.startswith("("):
continue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know that. I will update that!

f"Node dict must contain keys {required_keys}, got {set(node.keys())}"
)

def _unpack_node(self, node):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider making a node definition a NamedTuple or DataClass. I think that would perhaps reduce some of the required validation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using DependencyNode but I realized that we will need to add required, kickoff_job and so on to the definition. I can do that if this is preferred.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, defaults are easy to set in dataclass. So either add them to DependencyNode or make a class that inherits from DependencyNode and adds the default fields.

Then, I think you get some validation for free when instantiating like:
dep_node = DependencyNode(**upstream)

@tech3371 tech3371 mentioned this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task: Transition all instrument into their own new dependency config

3 participants