Skip to content

Commit 212f1a9

Browse files
fix(#177): fixed fixes_to_apply persistence and documented
1 parent 85d07a0 commit 212f1a9

File tree

4 files changed

+33
-9
lines changed

4 files changed

+33
-9
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,13 @@ scripts/run-with-local-http.sh
3131

3232
This project includes a convenience script to index and serve a remote STAC catalog. This script will fully index the remote STAC catalog each time it is run. This may not be the most efficient way to meet your needs, but it does help demonstrate some of this project's capabilities.
3333

34+
This script can optionally be called with a comma-separated list of STAC item JSON fixers, invoking the behaviour described [here](./docs/index-config.md#fixes).
35+
3436
```sh
3537
# indexes a public static STAC catalog over HTTPS and runs the API
3638
scripts/run-with-remote-source.sh https://esa.pages.eox.at/cubes-and-clouds-catalog/MOOC_Cubes_and_clouds/catalog.json
39+
# indexes and attempts to apply a single fixer if necessary
40+
scripts/run-with-remote-source.sh https://esa.pages.eox.at/cubes-and-clouds-catalog/MOOC_Cubes_and_clouds/catalog.json --fixes_to_apply eo-extension-uri
3741
```
3842

3943
Output includes the following information about the index.

docs/index-config.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
# Index Configuration
22

33
The indexer requires exactly one of the following arguments
4-
- `--root_catalog_uri` referencing the location of a STAC catalog JSON
5-
- `--manifest_json_uri` referencing the index manifest from a prior indexer run
4+
- `--root_catalog_uri` referencing the location of a STAC catalog JSON.
5+
- `--manifest_json_uri` referencing the index manifest from a prior indexer run.
66

7-
The indexer can optionally accept an argument referencing a JSON index configuration file, which offers greater control over indexer behaviour. The following describes that file's content.
7+
When indexing a new STAC catalog (i.e. not updating an existing index) the indexer can optionally accept an argument referencing a JSON index configuration file, which offers greater control over indexer behaviour. The following describes that file's content.
88

99
## Optional Properties
1010

11-
Any number of queryable and sortable STAC properties may be configured.
12-
1311
### Indexables
1412

1513
The indexer requires knowledge of the DuckDB data type that can be used to store queryable or sortable properties. Because properties can be both queryable _and_ sortable this configuration is maintained in the `indexables` property to avoid duplication.
@@ -24,6 +22,10 @@ Each queryable and sortable property must include a list of collections for whic
2422

2523
Queryables require a `json_schema` property containing a schema that could be used to validate values of this property. This JSON schema is not used directly by the API but is provided to API clients via the `/queryables` endpoints such that a client can validate any value it intends to send as query value for this property.
2624

25+
### Fixes
26+
27+
The indexer attempts to parse STAC item JSON using [stac-pydantic](https://pypi.org/project/stac-pydantic/). stac-pydantic is not particularly lenient and will reject invalid JSON, resulting in the STAC item not being indexed and an error in the indexer log. This may be valid in some use-cases, but in cases where STAC item JSON cannot be fixed, and may not be owned or controlled by the indexer's user, it might be preferable to index invalid JSON. The indexer supports a `fixes_to_apply` property. This property accepts a list of fixer names to attempt to apply to invalid JSON. Fixers are defined [in code](../packages/stac-index/src/stac_index/indexer/stac_parser.py) and must exist before being referenced here. The list of available fixers is currently short and may be expanded in future to accommodate common validity problems.
28+
2729
## Example
2830

2931
```json
@@ -52,6 +54,9 @@ Queryables require a `json_schema` property containing a schema that could be us
5254
"joplin"
5355
]
5456
}
55-
}
57+
},
58+
"fixes_to_apply": [
59+
"eo-extension-uri"
60+
]
5661
}
5762
```

packages/stac-index/src/stac_index/indexer/creator/creator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,6 @@ async def _index_stac_source(
8080
self: Self,
8181
root_catalog_uri: str,
8282
index_config: Optional[IndexConfig] = None,
83-
output_dir: Optional[str] = None,
8483
) -> Tuple[List[IndexingError], str]:
8584
_logger.info(f"indexing stac source for load {self._load_id}")
8685
self._create_db_objects()
@@ -98,6 +97,7 @@ async def _index_stac_source(
9897
collection_errors + items_errors,
9998
self._export_db_objects(
10099
root_catalog_uri=root_catalog_uri,
100+
index_config=index_config,
101101
),
102102
)
103103

scripts/run-with-remote-source.sh

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,21 @@ if [ "$#" -lt 1 ]; then
1010
fi
1111

1212
export root_catalog_uri="$1"
13+
shift
14+
15+
fixes_to_apply=""
16+
while [[ $# -gt 0 ]]; do
17+
case $1 in
18+
--fixes_to_apply)
19+
fixes_to_apply="$2"
20+
shift; shift
21+
;;
22+
*)
23+
echo "Unknown option $1"
24+
exit 1
25+
;;
26+
esac
27+
done
1328

1429
if [[ $root_catalog_uri == s3://* ]]; then
1530
echo; echo "* Assumes \$AWS_ACCESS_KEY_ID, \$AWS_REGION, \$AWS_SECRET_ACCESS_KEY, and (optionally) \$AWS_SESSION_TOKEN are set for obstore *"; echo
@@ -25,9 +40,9 @@ if [ -f $"$tmp_index_path/manifest.json" ]; then
2540
unset root_catalog_uri
2641
else
2742
# No point evaluating this if updating an existing index as it will be ignored.
28-
if [ -n "${FIXES_TO_APPLY}" ]; then
43+
if [ -n "$fixes_to_apply" ]; then
2944
export tmp_index_config_path=$(mktemp)
30-
fixes_json=$(echo "${FIXES_TO_APPLY}" | sed "s/,\s*/\", \"/g")
45+
fixes_json=$(echo "$fixes_to_apply" | sed "s/,\s*/\", \"/g")
3146
echo "{\"fixes_to_apply\": [\"${fixes_json}\"]}" > $tmp_index_config_path
3247
fi
3348
fi

0 commit comments

Comments
 (0)