New dep files by tech3371 · Pull Request #1195 · IMAP-Science-Operations-Center/sds-data-manager

tech3371 · 2026-03-25T21:10:54Z

Change Summary

Overview

This are changes that we are building up slowing into dependency_refactor branch. These won't be implemented and push to production yet. It's incremental dependency refactoring changes.

File changes

Added new dependencies files for individual instrument team. Each file contains data products and their upstream dependency. It is closely related to documentation on this PR. Both are dependent on each other and build upon each other. Once this PR is in good shape and ready to merge, I plan to go back and update that documentation PR based on outcome from this PR.

In this PR, main changes are separating dependencies for each instrument into their own files and then format in new design. Then added code that reads and validate dependency node information.

Testing

… config format

Copilot

Pull request overview

This PR continues the incremental dependency refactor by splitting pipeline dependency configurations into per-instrument YAML files and adding code/tests to load and validate the new format. It also includes related changes to dependency resolution behavior (Hi Goodtimes multi-repoint support, non-daily coverage handling), SPICE metakernel time conversion, and API-key scope enforcement.

Changes:

Add per-instrument dependency YAML files and a new DependencyConfigReader to parse/validate them.
Extend dependency resolution logic (Hi Goodtimes N-nearest repoints, non-daily instrument coverage skip, helper queries + tests).
Update API-key authorization/scope behavior and SPICE metakernel utilities/tests; bump imap-data-access / imap-processing versions.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/lambda_endpoints/test_upload_api.py	Adds authorizer context to upload tests; adds 403 tests for read/empty scope.
tests/lambda_endpoints/test_spice_metakernel_api.py	Unskips/expands metakernel tests; uploads kernels to mock S3 for time conversion coverage.
tests/lambda_endpoints/test_scheduled_job.py	Updates scheduled job tests to use shared upload mocking fixture.
tests/lambda_endpoints/test_dependency_new.py	New unit tests for the new dependency YAML reader/validator.
tests/lambda_endpoints/test_dependency_api.py	Adds extensive test coverage for repoint/list handling, nearest-neighbor selection, and INPROGRESS skip logic.
tests/lambda_endpoints/test_batch_starter.py	Updates batch starter tests and adds Hi Goodtimes multi-repoint trigger + IDEX buffered query tests.
tests/lambda_endpoints/conftest.py	Adds `mock_upload_request_success` fixture to isolate tests from real upload/HTTP calls.
tests/infrastructure/test_api_key_dynamodb.py	Updates scope semantics and adds write/upload restriction tests for read-only keys.
sds_data_manager/lambda_code/processing/requirements.txt	Bumps `imap-data-access` and `imap-processing` versions for processing lambda package set.
sds_data_manager/lambda_code/authorization/manage_api_keys.py	Adds scope validation + improved CLI usage/docs for `full` vs `read`.
sds_data_manager/lambda_code/authorization/lambda_api_key_authorizer.py	Introduces scope/method/path checks and structured logging for authorization decisions.
sds_data_manager/lambda_code/SDSCode/spice_utilities.py	New shared SPICE helpers to download kernels from S3 and furnish “best” kernel.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/spice_indexer.py	Refactors to reuse shared SPICE utility functions.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py	New reader/validator for per-instrument dependency YAML format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_config.csv	Updates legacy CSV dependency mappings (notably Hi Goodtimes and some SPICE deps).
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency.py	Adds non-daily handling, session reuse, multi-repoint dependencies, nearest-neighbor helpers, and Hi Goodtimes extensions.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_swe_dependencies.yaml	New SWE dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_swapi_dependencies.yaml	New SWAPI dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_spacecraft_dependencies.yaml	New Spacecraft dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_mag_dependencies.yaml	New MAG dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_lo_dependencies.yaml	New LO dependency YAML in new format (includes anchors/aliases).
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_idex_dependencies.yaml	New IDEX dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hit_dependencies.yaml	New HIT dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml	New HI dependency YAML in new format (includes Goodtimes date_range).
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_glows_dependencies.yaml	New GLOWS dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_codice_dependencies.yaml	New CoDICE dependency YAML in new format.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/batch_starter.py	Adjusts dependency config usage; adds IDEX buffered query start; implements Hi Goodtimes multi-repoint triggering; updates upload API invocation event.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/init.py	Adds `NON_DAILY_INSTRUMENTS` list.
sds_data_manager/lambda_code/SDSCode/api_lambdas/upload_api.py	Adds scope enforcement (deny read/missing scope) and logging for upload requests.
sds_data_manager/lambda_code/SDSCode/api_lambdas/spice_metakernel_api.py	Adds leapsecond kernel furnishing path to enable human-readable date inputs.
sds_data_manager/lambda_code/SDSCode/api_lambdas/metakernel.py	Refactors metakernel file selection logic and internal gap tracking structures.
sds_data_manager/lambda_code/IAlirtCode/ialirt_realtime.py	Sanitizes non-finite floats in JSON output before uploading to S3.
sds_data_manager/lambda_code/IAlirtCode/ialirt_data_query_api.py	Updates scope allowlist to include `read` and adjusts filtering behavior.
sds_data_manager/lambda_code/IAlirtCode/ialirt_coverage.py	Adds UKSA schedule ingestion and passes it into coverage generation; adjusts DSN sorting.
sds_data_manager/constructs/sds_api_manager_construct.py	Updates SPICE metakernel lambda env/memory/timeout; adds S3 read policy and IMAP_DATA_DIR/S3_BUCKET env vars.
sds_data_manager/constructs/processing_construct.py	Switches processing compute environment from spot to on-demand.
sds_data_manager/constructs/ialirt_processing_construct.py	Expands partner access ports for LASP/UKSA.
sds_data_manager/constructs/ialirt_api_manager_construct.py	Renames/re-wires IALiRT query endpoints and adds reserved concurrency; adjusts which routes are auth vs restricted.
pyproject.toml	Bumps `imap-processing` version.
poetry.lock	Updates lockfile for bumped dependency versions / Poetry version.
lambda_layer/spice/requirements.txt	Bumps `imap-data-access` in SPICE lambda layer.
lambda_layer/database/requirements.txt	Bumps `imap-data-access` in DB lambda layer.
README.md	Documents scope options (`full`, `read`) and updates API key management examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T21:15:15Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py

+    def validate_node(self, node: list) -> bool:
+        """Validate a dependency node.
+
+        A valid node must have exactly 5 elements or 6 elements:
+            (
+                source,
+                data_type,
+                descriptor,
+                required,
+                kickoff_job,
+                Optional(past, future)
+            )
+        If it includes past/future date ranges, it should follow the following format:
+            - p - pointing
+            - h - hourly
+            - d - days
+            - l - last_processed
+            - n - nearest
+            past and future should end with one of these options. Eg.
+                ("-3p", "3pm") means 3 pointing
+                ("-3d", "5d") means 5 days
+                ("-2h", "2h") means 2 hours
+                ("1l",) means last processed
+
+        Validation is performed for each field.
+
+        Parameters
+        ----------
+        node : DependencyNode
+            Node to validate.
+
+        Returns
+        -------
+        bool
+            True if node is valid.
+
+        Raises
+        ------
+        ValueError
+            If node format is invalid or contains invalid values.
+
+        Examples
+        --------
+        >>> config = DependencyConfig()
+        >>> config.validate_node(('codice', 'l1a', 'all'))
+        True
+        >>> config.validate_node(('invalid', 'l1a', 'all'))
+        Traceback (most recent call last):
+            ...
+        ValueError: Invalid data source...


validate_node is now validating dict-shaped nodes (with upstream_* keys), but the signature/type hint and docstring still describe a tuple/list-based “5 or 6 element” format. Updating the type annotation and docstring will prevent confusion and make the public contract clear.

Suggested change

def validate_node(self, node: list) -> bool:

"""Validate a dependency node.

A valid node must have exactly 5 elements or 6 elements:

(

source,

data_type,

descriptor,

required,

kickoff_job,

Optional(past, future)

)

If it includes past/future date ranges, it should follow the following format:

- p - pointing

- h - hourly

- d - days

- l - last_processed

- n - nearest

past and future should end with one of these options. Eg.

("-3p", "3pm") means 3 pointing

("-3d", "5d") means 5 days

("-2h", "2h") means 2 hours

("1l",) means last processed

Validation is performed for each field.

Parameters

----------

node : DependencyNode

Node to validate.

Returns

-------

bool

True if node is valid.

Raises

------

ValueError

If node format is invalid or contains invalid values.

Examples

--------

>>> config = DependencyConfig()

>>> config.validate_node(('codice', 'l1a', 'all'))

True

>>> config.validate_node(('invalid', 'l1a', 'all'))

Traceback (most recent call last):

...

ValueError: Invalid data source...

def validate_node(self, node: dict) -> bool:

"""Validate a dependency node.

A valid node is a mapping describing a single upstream dependency with

the following fields:

- ``upstream_source`` (str)

- ``upstream_data_type`` (str)

- ``upstream_descriptor`` (str)

- ``required`` (bool)

- ``kickoff_job`` (bool)

- optional ``date_range`` describing the past/future window

The optional date range, when present, controls how far in the past and

future to look for matching data. It uses a compact string format, where

the unit is encoded in the suffix:

- ``p`` - pointing

- ``h`` - hourly

- ``d`` - days

- ``l`` - last_processed

- ``n`` - nearest

Past and future values must end with one of these unit specifiers. For

example:

- ``("-3p", "3p")`` means 3 pointings in the past and 3 in the future

- ``("-3d", "5d")`` means from 3 days in the past to 5 days in the future

- ``("-2h", "2h")`` means from 2 hours in the past to 2 hours in the future

- ``("1l",)`` means the last processed value

Validation is performed for each field via the internal helpers on this

class.

Parameters

----------

node : dict

Dependency node to validate.

Returns

-------

bool

True if the node is valid.

Raises

------

ValueError

If the node format is invalid or contains invalid values.

Examples

--------

>>> config = DependencyConfigReader()

>>> node = {

... "upstream_source": "codice",

... "upstream_data_type": "l1a",

... "upstream_descriptor": "all",

... "required": True,

... "kickoff_job": False,

... }

>>> config.validate_node(node)

True

tech3371 · 2026-03-25T21:15:23Z

I will switch base to dependency_refactor at merge time or something.

tech3371 · 2026-03-25T21:18:04Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml

+  - upstream_source: hi
+    upstream_data_type: l1b
+    upstream_descriptor: 45sensor-de
+    date_range: ["-3p", "3p"]


Please look here on how to use optional date range field.

tech3371 · 2026-03-25T21:18:53Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml

+  - upstream_source: planetary_ephemeris
+    upstream_data_type: spice
+    upstream_descriptor: historical
+    kickoff_job: false


kickoff_job is true by default. If not true, then we need to overwrite like this. Commenting on this as an example and information to look out for.

This helps with removing lot of duplicate lines.

Key and value were added to make reading in these information in the code easier and reduce human error of putting information in this file. List can cause issue if information is put at wrong index but key and value pairs reduces that risks. That's why I went with this approach.

tech3371 · 2026-03-25T21:33:12Z

@pleasant-menlo I am tagging you on these PR for awareness purposes only. I don't expect our current CLI inputs or how dependency information is passed to change. This is for internal SDC backend improvement and features we need to add to support new cases that has come up in L1 and L2.

You can expect L3 inputs to be same as what it has been. So no worries there :). But since how we store dependency information is changing, I am tagging you for awareness since you have been very helpful there for L3.

subagonsouth

Here is some feedback... some of it is very nit picky and likely to be somewhat opinionated.

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml

subagonsouth · 2026-03-26T15:29:57Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml

+    kickoff_job: false
+
+
+spice_45sensor_l1b: &spice_45sensor_l1b


spice_45sensor_l1b and spice_90sensor_l1b are identical and should be reduced to one definition of spice_l1b_de as they are specific to just the DE products.

subagonsouth · 2026-03-26T15:39:31Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml

This is a bit of a nit pick, but I think that ordering of the definitions is important in these files for readability. I would suggest:

common definitions shared across many/all products. e.g. spice_basic.

Sub-sections ordered by level then type. In each sub-section order is:

common definitions for sub-section. e.g. l1b_de_spice, l2_spice

individual sensor definitions. e.g. l2_45sensor_ancillary

individual product definitions. e.g. (l2, h45-ena-h-hf-nsp-anti-hae-4deg-6mo)

I like how in the current CSV, it is easy to see all dependencies for a single product without scrolling all over the file.

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml

subagonsouth · 2026-03-26T15:55:43Z

..._manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_spacecraft_dependencies.yaml

This file looks good to me.

subagonsouth · 2026-03-26T16:21:12Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py

+                # Anchor keys don't have parentheses, so skip them.
+                if key_str.startswith("#") or not key_str.startswith("("):
+                    continue
+


No need to skip comments. They are standard in YAML and will automatically get ignored by the reader.

Suggested change

# Skip any anchor definitions (common dependency groups).

# Product keys have format: (data_type, descriptor)

# Anchor keys don't have parentheses, so skip them.

if not key_str.startswith("("):

continue

didn't know that. I will update that!

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py

subagonsouth · 2026-03-26T16:56:39Z

sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py

+                f"Node dict must contain keys {required_keys}, got {set(node.keys())}"
+            )
+
+    def _unpack_node(self, node):


Could consider making a node definition a NamedTuple or DataClass. I think that would perhaps reduce some of the required validation.

I tried using DependencyNode but I realized that we will need to add required, kickoff_job and so on to the definition. I can do that if this is preferred.

Yep, defaults are easy to set in dataclass. So either add them to DependencyNode or make a class that inherits from DependencyNode and adds the default fields.

Then, I think you get some validation for free when instantiating like:
dep_node = DependencyNode(**upstream)

tech3371 added 3 commits March 25, 2026 13:02

ENH: Initial work of transfering all instrument's into new dependency…

ffbd87a

… config format

removed dependency logics from this branch to keep PR scope small

47018f3

cleaning up

0d78127

tech3371 requested review from ahotasu, Copilot, jtniehof, lacoak21, laspsandoval, maxinelasp, pleasant-menlo and subagonsouth March 25, 2026 21:10

tech3371 self-assigned this Mar 25, 2026

tech3371 added the enhancement New feature or request label Mar 25, 2026

Copilot started reviewing on behalf of tech3371 March 25, 2026 21:11 View session

tech3371 force-pushed the new_dep_files branch from 783efcc to 0d78127 Compare March 25, 2026 21:11

tech3371 changed the base branch from dependency_refactor to dev March 25, 2026 21:14

Copilot AI reviewed Mar 25, 2026

View reviewed changes

tech3371 commented Mar 25, 2026

View reviewed changes

subagonsouth reviewed Mar 26, 2026

View reviewed changes

tech3371 mentioned this pull request Mar 27, 2026

Refactor foundation #1162

Closed

tech3371 added 3 commits April 7, 2026 15:43

removed spaces

f7fd936

added np and nd options

4c035c3

multiple inheritance support

dc96d48

+                # Skip any anchor definitions (common dependency groups).
+                # Product keys have format: (data_type, descriptor)
+                # Anchor keys don't have parentheses, so skip them.
+                if not key_str.startswith("("):
+                    continue

Conversation

tech3371 commented Mar 25, 2026

Change Summary

Overview

File changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

tech3371 commented Mar 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tech3371 commented Mar 25, 2026

Uh oh!

subagonsouth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants