Conversation
There was a problem hiding this comment.
Pull request overview
This PR continues the incremental dependency refactor by splitting pipeline dependency configurations into per-instrument YAML files and adding code/tests to load and validate the new format. It also includes related changes to dependency resolution behavior (Hi Goodtimes multi-repoint support, non-daily coverage handling), SPICE metakernel time conversion, and API-key scope enforcement.
Changes:
- Add per-instrument dependency YAML files and a new
DependencyConfigReaderto parse/validate them. - Extend dependency resolution logic (Hi Goodtimes N-nearest repoints, non-daily instrument coverage skip, helper queries + tests).
- Update API-key authorization/scope behavior and SPICE metakernel utilities/tests; bump
imap-data-access/imap-processingversions.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/lambda_endpoints/test_upload_api.py | Adds authorizer context to upload tests; adds 403 tests for read/empty scope. |
| tests/lambda_endpoints/test_spice_metakernel_api.py | Unskips/expands metakernel tests; uploads kernels to mock S3 for time conversion coverage. |
| tests/lambda_endpoints/test_scheduled_job.py | Updates scheduled job tests to use shared upload mocking fixture. |
| tests/lambda_endpoints/test_dependency_new.py | New unit tests for the new dependency YAML reader/validator. |
| tests/lambda_endpoints/test_dependency_api.py | Adds extensive test coverage for repoint/list handling, nearest-neighbor selection, and INPROGRESS skip logic. |
| tests/lambda_endpoints/test_batch_starter.py | Updates batch starter tests and adds Hi Goodtimes multi-repoint trigger + IDEX buffered query tests. |
| tests/lambda_endpoints/conftest.py | Adds mock_upload_request_success fixture to isolate tests from real upload/HTTP calls. |
| tests/infrastructure/test_api_key_dynamodb.py | Updates scope semantics and adds write/upload restriction tests for read-only keys. |
| sds_data_manager/lambda_code/processing/requirements.txt | Bumps imap-data-access and imap-processing versions for processing lambda package set. |
| sds_data_manager/lambda_code/authorization/manage_api_keys.py | Adds scope validation + improved CLI usage/docs for full vs read. |
| sds_data_manager/lambda_code/authorization/lambda_api_key_authorizer.py | Introduces scope/method/path checks and structured logging for authorization decisions. |
| sds_data_manager/lambda_code/SDSCode/spice_utilities.py | New shared SPICE helpers to download kernels from S3 and furnish “best” kernel. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/spice_indexer.py | Refactors to reuse shared SPICE utility functions. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py | New reader/validator for per-instrument dependency YAML format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_config.csv | Updates legacy CSV dependency mappings (notably Hi Goodtimes and some SPICE deps). |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency.py | Adds non-daily handling, session reuse, multi-repoint dependencies, nearest-neighbor helpers, and Hi Goodtimes extensions. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_swe_dependencies.yaml | New SWE dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_swapi_dependencies.yaml | New SWAPI dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_spacecraft_dependencies.yaml | New Spacecraft dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_mag_dependencies.yaml | New MAG dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_lo_dependencies.yaml | New LO dependency YAML in new format (includes anchors/aliases). |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_idex_dependencies.yaml | New IDEX dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hit_dependencies.yaml | New HIT dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml | New HI dependency YAML in new format (includes Goodtimes date_range). |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_glows_dependencies.yaml | New GLOWS dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_codice_dependencies.yaml | New CoDICE dependency YAML in new format. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/batch_starter.py | Adjusts dependency config usage; adds IDEX buffered query start; implements Hi Goodtimes multi-repoint triggering; updates upload API invocation event. |
| sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/init.py | Adds NON_DAILY_INSTRUMENTS list. |
| sds_data_manager/lambda_code/SDSCode/api_lambdas/upload_api.py | Adds scope enforcement (deny read/missing scope) and logging for upload requests. |
| sds_data_manager/lambda_code/SDSCode/api_lambdas/spice_metakernel_api.py | Adds leapsecond kernel furnishing path to enable human-readable date inputs. |
| sds_data_manager/lambda_code/SDSCode/api_lambdas/metakernel.py | Refactors metakernel file selection logic and internal gap tracking structures. |
| sds_data_manager/lambda_code/IAlirtCode/ialirt_realtime.py | Sanitizes non-finite floats in JSON output before uploading to S3. |
| sds_data_manager/lambda_code/IAlirtCode/ialirt_data_query_api.py | Updates scope allowlist to include read and adjusts filtering behavior. |
| sds_data_manager/lambda_code/IAlirtCode/ialirt_coverage.py | Adds UKSA schedule ingestion and passes it into coverage generation; adjusts DSN sorting. |
| sds_data_manager/constructs/sds_api_manager_construct.py | Updates SPICE metakernel lambda env/memory/timeout; adds S3 read policy and IMAP_DATA_DIR/S3_BUCKET env vars. |
| sds_data_manager/constructs/processing_construct.py | Switches processing compute environment from spot to on-demand. |
| sds_data_manager/constructs/ialirt_processing_construct.py | Expands partner access ports for LASP/UKSA. |
| sds_data_manager/constructs/ialirt_api_manager_construct.py | Renames/re-wires IALiRT query endpoints and adds reserved concurrency; adjusts which routes are auth vs restricted. |
| pyproject.toml | Bumps imap-processing version. |
| poetry.lock | Updates lockfile for bumped dependency versions / Poetry version. |
| lambda_layer/spice/requirements.txt | Bumps imap-data-access in SPICE lambda layer. |
| lambda_layer/database/requirements.txt | Bumps imap-data-access in DB lambda layer. |
| README.md | Documents scope options (full, read) and updates API key management examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def validate_node(self, node: list) -> bool: | ||
| """Validate a dependency node. | ||
|
|
||
| A valid node must have exactly 5 elements or 6 elements: | ||
| ( | ||
| source, | ||
| data_type, | ||
| descriptor, | ||
| required, | ||
| kickoff_job, | ||
| Optional(past, future) | ||
| ) | ||
| If it includes past/future date ranges, it should follow the following format: | ||
| - p - pointing | ||
| - h - hourly | ||
| - d - days | ||
| - l - last_processed | ||
| - n - nearest | ||
| past and future should end with one of these options. Eg. | ||
| ("-3p", "3pm") means 3 pointing | ||
| ("-3d", "5d") means 5 days | ||
| ("-2h", "2h") means 2 hours | ||
| ("1l",) means last processed | ||
|
|
||
| Validation is performed for each field. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| node : DependencyNode | ||
| Node to validate. | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| True if node is valid. | ||
|
|
||
| Raises | ||
| ------ | ||
| ValueError | ||
| If node format is invalid or contains invalid values. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> config = DependencyConfig() | ||
| >>> config.validate_node(('codice', 'l1a', 'all')) | ||
| True | ||
| >>> config.validate_node(('invalid', 'l1a', 'all')) | ||
| Traceback (most recent call last): | ||
| ... | ||
| ValueError: Invalid data source... |
There was a problem hiding this comment.
validate_node is now validating dict-shaped nodes (with upstream_* keys), but the signature/type hint and docstring still describe a tuple/list-based “5 or 6 element” format. Updating the type annotation and docstring will prevent confusion and make the public contract clear.
| def validate_node(self, node: list) -> bool: | |
| """Validate a dependency node. | |
| A valid node must have exactly 5 elements or 6 elements: | |
| ( | |
| source, | |
| data_type, | |
| descriptor, | |
| required, | |
| kickoff_job, | |
| Optional(past, future) | |
| ) | |
| If it includes past/future date ranges, it should follow the following format: | |
| - p - pointing | |
| - h - hourly | |
| - d - days | |
| - l - last_processed | |
| - n - nearest | |
| past and future should end with one of these options. Eg. | |
| ("-3p", "3pm") means 3 pointing | |
| ("-3d", "5d") means 5 days | |
| ("-2h", "2h") means 2 hours | |
| ("1l",) means last processed | |
| Validation is performed for each field. | |
| Parameters | |
| ---------- | |
| node : DependencyNode | |
| Node to validate. | |
| Returns | |
| ------- | |
| bool | |
| True if node is valid. | |
| Raises | |
| ------ | |
| ValueError | |
| If node format is invalid or contains invalid values. | |
| Examples | |
| -------- | |
| >>> config = DependencyConfig() | |
| >>> config.validate_node(('codice', 'l1a', 'all')) | |
| True | |
| >>> config.validate_node(('invalid', 'l1a', 'all')) | |
| Traceback (most recent call last): | |
| ... | |
| ValueError: Invalid data source... | |
| def validate_node(self, node: dict) -> bool: | |
| """Validate a dependency node. | |
| A valid node is a mapping describing a single upstream dependency with | |
| the following fields: | |
| - ``upstream_source`` (str) | |
| - ``upstream_data_type`` (str) | |
| - ``upstream_descriptor`` (str) | |
| - ``required`` (bool) | |
| - ``kickoff_job`` (bool) | |
| - optional ``date_range`` describing the past/future window | |
| The optional date range, when present, controls how far in the past and | |
| future to look for matching data. It uses a compact string format, where | |
| the unit is encoded in the suffix: | |
| - ``p`` - pointing | |
| - ``h`` - hourly | |
| - ``d`` - days | |
| - ``l`` - last_processed | |
| - ``n`` - nearest | |
| Past and future values must end with one of these unit specifiers. For | |
| example: | |
| - ``("-3p", "3p")`` means 3 pointings in the past and 3 in the future | |
| - ``("-3d", "5d")`` means from 3 days in the past to 5 days in the future | |
| - ``("-2h", "2h")`` means from 2 hours in the past to 2 hours in the future | |
| - ``("1l",)`` means the last processed value | |
| Validation is performed for each field via the internal helpers on this | |
| class. | |
| Parameters | |
| ---------- | |
| node : dict | |
| Dependency node to validate. | |
| Returns | |
| ------- | |
| bool | |
| True if the node is valid. | |
| Raises | |
| ------ | |
| ValueError | |
| If the node format is invalid or contains invalid values. | |
| Examples | |
| -------- | |
| >>> config = DependencyConfigReader() | |
| >>> node = { | |
| ... "upstream_source": "codice", | |
| ... "upstream_data_type": "l1a", | |
| ... "upstream_descriptor": "all", | |
| ... "required": True, | |
| ... "kickoff_job": False, | |
| ... } | |
| >>> config.validate_node(node) | |
| True |
|
I will switch base to |
| - upstream_source: hi | ||
| upstream_data_type: l1b | ||
| upstream_descriptor: 45sensor-de | ||
| date_range: ["-3p", "3p"] |
There was a problem hiding this comment.
Please look here on how to use optional date range field.
| - upstream_source: planetary_ephemeris | ||
| upstream_data_type: spice | ||
| upstream_descriptor: historical | ||
| kickoff_job: false |
There was a problem hiding this comment.
kickoff_job is true by default. If not true, then we need to overwrite like this. Commenting on this as an example and information to look out for.
This helps with removing lot of duplicate lines.
Key and value were added to make reading in these information in the code easier and reduce human error of putting information in this file. List can cause issue if information is put at wrong index but key and value pairs reduces that risks. That's why I went with this approach.
|
@pleasant-menlo I am tagging you on these PR for awareness purposes only. I don't expect our current CLI inputs or how dependency information is passed to change. This is for internal SDC backend improvement and features we need to add to support new cases that has come up in L1 and L2. You can expect L3 inputs to be same as what it has been. So no worries there :). But since how we store dependency information is changing, I am tagging you for awareness since you have been very helpful there for L3. |
subagonsouth
left a comment
There was a problem hiding this comment.
Here is some feedback... some of it is very nit picky and likely to be somewhat opinionated.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml
Outdated
Show resolved
Hide resolved
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml
Show resolved
Hide resolved
| kickoff_job: false | ||
|
|
||
|
|
||
| spice_45sensor_l1b: &spice_45sensor_l1b |
There was a problem hiding this comment.
spice_45sensor_l1b and spice_90sensor_l1b are identical and should be reduced to one definition of spice_l1b_de as they are specific to just the DE products.
There was a problem hiding this comment.
This is a bit of a nit pick, but I think that ordering of the definitions is important in these files for readability. I would suggest:
- common definitions shared across many/all products. e.g.
spice_basic. - Sub-sections ordered by level then type. In each sub-section order is:
- common definitions for sub-section. e.g.
l1b_de_spice,l2_spice - individual sensor definitions. e.g.
l2_45sensor_ancillary - individual product definitions. e.g.
(l2, h45-ena-h-hf-nsp-anti-hae-4deg-6mo)
- common definitions for sub-section. e.g.
I like how in the current CSV, it is easy to see all dependencies for a single product without scrolling all over the file.
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependencies/imap_hi_dependencies.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
This file looks good to me.
| # Anchor keys don't have parentheses, so skip them. | ||
| if key_str.startswith("#") or not key_str.startswith("("): | ||
| continue | ||
|
|
There was a problem hiding this comment.
No need to skip comments. They are standard in YAML and will automatically get ignored by the reader.
| # Skip any anchor definitions (common dependency groups). | |
| # Product keys have format: (data_type, descriptor) | |
| # Anchor keys don't have parentheses, so skip them. | |
| if not key_str.startswith("("): | |
| continue |
There was a problem hiding this comment.
didn't know that. I will update that!
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency_new.py
Outdated
Show resolved
Hide resolved
| f"Node dict must contain keys {required_keys}, got {set(node.keys())}" | ||
| ) | ||
|
|
||
| def _unpack_node(self, node): |
There was a problem hiding this comment.
Could consider making a node definition a NamedTuple or DataClass. I think that would perhaps reduce some of the required validation.
There was a problem hiding this comment.
I tried using DependencyNode but I realized that we will need to add required, kickoff_job and so on to the definition. I can do that if this is preferred.
There was a problem hiding this comment.
Yep, defaults are easy to set in dataclass. So either add them to DependencyNode or make a class that inherits from DependencyNode and adds the default fields.
Then, I think you get some validation for free when instantiating like:
dep_node = DependencyNode(**upstream)
Change Summary
closes #1153
Overview
This are changes that we are building up slowing into dependency_refactor branch. These won't be implemented and push to production yet. It's incremental dependency refactoring changes.
File changes
Added new dependencies files for individual instrument team. Each file contains data products and their upstream dependency. It is closely related to documentation on this PR. Both are dependent on each other and build upon each other. Once this PR is in good shape and ready to merge, I plan to go back and update that documentation PR based on outcome from this PR.
In this PR, main changes are separating dependencies for each instrument into their own files and then format in new design. Then added code that reads and validate dependency node information.
Testing