Skip to content

Commit 85d07a0

Browse files
feat(#175): remote source example
* feat(#175): retaining index between remote source runs * feat(#175): better described remote-source script in README
1 parent 0331eb7 commit 85d07a0

File tree

7 files changed

+66
-22
lines changed

7 files changed

+66
-22
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,5 @@ dist
2323

2424
# indexer when run locally
2525
packages/stac-index/src/stac_index/indexer/index_data/
26+
# indexer when run by `run-with-remote-source.sh`
27+
.remote-source-index

README.md

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,42 @@ See [Development](./docs/development.md) for detailed guidance on how to work on
1616

1717
### Quickstart
1818

19-
To get a quick demo up and running execute any of the following scripts and navigate to http://localhost:8123/api.html
19+
To get a quick demo up and running with a test dataset execute any of the following scripts and navigate to http://localhost:8123/api.html
2020

2121
```sh
22-
scripts/run-with-local-s3.sh # loads a sample dataset into minio, indexes it, loads the index into minio, and runs the API
23-
scripts/run-with-local-file.sh # indexes a sample dataset on the filesystem and runs the API
24-
scripts/run-with-local-http.sh # loads a sample dataset into a HTTP fileserver, indexes it, and runs the API
25-
scripts/run-with-remote-source.sh https://capella-open-data.s3.us-west-2.amazonaws.com/stac/catalog.json # indexes a public static STAC catalog over HTTPS and runs the API
22+
# loads a sample dataset into minio, indexes it, loads the index into minio, and runs the API
23+
scripts/run-with-local-s3.sh
24+
# indexes a sample dataset on the filesystem and runs the API
25+
scripts/run-with-local-file.sh
26+
# loads a sample dataset into a HTTP fileserver, indexes it, and runs the API
27+
scripts/run-with-local-http.sh
28+
```
29+
30+
### Index Remote STAC Catalog
31+
32+
This project includes a convenience script to index and serve a remote STAC catalog. This script will fully index the remote STAC catalog each time it is run. This may not be the most efficient way to meet your needs, but it does help demonstrate some of this project's capabilities.
33+
34+
```sh
35+
# indexes a public static STAC catalog over HTTPS and runs the API
36+
scripts/run-with-remote-source.sh https://esa.pages.eox.at/cubes-and-clouds-catalog/MOOC_Cubes_and_clouds/catalog.json
37+
```
38+
39+
Output includes the following information about the index.
40+
```sh
41+
* Indexing may take some time, depending on the size of the catalog
42+
* Indexing to /.../source/sparkgeo/STAC-API-Serverless/.remote-source-index/httpsesapageseoxatcubesandcloudscatalogMOOCCubesandcloudscatalogjson
43+
```
44+
45+
The generated index files can be inspected at `.../.remote-source-index/httpsesapageseoxatcubesandcloudscatalogMOOCCubesandcloudscatalogjson` if necessary. If at a later time you want to run the API against this same index, without re-indexing the remote STAC catalog, this can be achieved with the following:
46+
47+
```sh
48+
docker run \
49+
--rm \
50+
-it \
51+
-v $PWD/.remote-source-index/httpsesapageseoxatcubesandcloudscatalogMOOCCubesandcloudscatalogjson:/index:ro \
52+
-e stac_api_indexed_index_manifest_uri=/index/manifest.json \
53+
-p 8123:80 \
54+
sparkgeo/stac_fastapi_indexed
2655
```
2756

2857
## Overview

docker-compose.remote-source.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,18 @@ services:
88
INDEX_ROOT_CATALOG_URI: ${root_catalog_uri}
99
INDEX_CONFIG_PATH: /index-config.json
1010
INDEX_PUBLISH_PATH: /output
11+
INDEX_MANIFEST_JSON_URI: ${index_manifest_json_uri}
1112
AWS_ACCESS_KEY_ID:
1213
AWS_REGION:
1314
AWS_SECRET_ACCESS_KEY:
1415
AWS_SESSION_TOKEN:
1516
volumes:
16-
- "indexer-output:/output:rw"
17-
# this compose file will not work without $tmp_index_config_path being set (this is intentional, it is set by scripts/run-with-remote-source.sh)
18-
- "${tmp_index_config_path}:/index-config.json:ro"
17+
- "${tmp_index_path:-indexer-output}:/output:rw"
18+
- "${tmp_index_config_path:-indexer-config-fallback}:/index-config.json:ro"
1919

2020
api:
2121
volumes:
22-
- indexer-output:/index:ro
22+
- "${tmp_index_path:-indexer-output}:/index:ro"
2323
environment:
2424
stac_api_indexed_index_manifest_uri: /index/manifest.json
2525
AWS_ACCESS_KEY_ID:
@@ -32,3 +32,4 @@ services:
3232

3333
volumes:
3434
indexer-output:
35+
indexer-config-fallback:

docker/indexer/command.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ if [ -n "$INDEX_ROOT_CATALOG_URI" ]; then
1212
fi
1313

1414
if [ -n "$INDEX_MANIFEST_JSON_URI" ]; then
15-
manifest_json_uri="--manifest_json_uri $INDEX_MANIFEST_JSON_URI"
15+
manifest_json_uri_argument="--manifest_json_uri $INDEX_MANIFEST_JSON_URI"
1616
fi
1717

18-
if [ -n "$INDEX_CONFIG_PATH" ]; then
18+
if [ -n "$INDEX_CONFIG_PATH" ] && [ -f "$INDEX_CONFIG_PATH" ]; then
1919
index_config_argument="--index_config $INDEX_CONFIG_PATH"
2020
fi
2121

packages/stac-index/src/stac_index/indexer/creator/creator.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,10 @@ def _log_index_event(self: Self, root_catalog_uri: str) -> None:
471471

472472
def _insert_errors(self: Self, errors: list[IndexingError]) -> None:
473473
for error in errors:
474-
save_error(self._conn, error)
474+
try:
475+
save_error(self._conn, error)
476+
except Exception as e:
477+
_logger.exception("failed to insert indexing error: {}".format(e))
475478

476479
async def _load_existing_index(self: Self, manifest_json_uri: str) -> IndexManifest:
477480
source_reader = get_reader_for_uri(uri=manifest_json_uri)

scripts/run-with-remote-source.sh

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,35 @@ set -e
44

55
pushd $(dirname $0)/..
66

7-
if [ "$#" -ne 1 ]; then
7+
if [ "$#" -lt 1 ]; then
88
echo "Usage: $0 <root-catalog-uri>"
99
exit 1
1010
fi
1111

1212
export root_catalog_uri="$1"
13+
1314
if [[ $root_catalog_uri == s3://* ]]; then
1415
echo; echo "* Assumes \$AWS_ACCESS_KEY_ID, \$AWS_REGION, \$AWS_SECRET_ACCESS_KEY, and (optionally) \$AWS_SESSION_TOKEN are set for obstore *"; echo
1516
fi
1617

17-
export tmp_index_config_path=$(mktemp)
18-
if [ -z "${FIXES_TO_APPLY}" ]; then
19-
echo "{}" > $tmp_index_config_path
18+
export tmp_index_path=$PWD/.remote-source-index/$(echo "$root_catalog_uri" | tr -cd '[:alnum:]')
19+
echo; echo "* Indexing may take some time, depending on the size of the catalog";
20+
echo "* Indexing to $tmp_index_path"; echo
21+
# Persist generated index and manifest files locally to support faster repeat runs against the same remote source.
22+
if [ -f $"$tmp_index_path/manifest.json" ]; then
23+
# Tell the indexer there's already an existing index to update.
24+
export index_manifest_json_uri="/output/manifest.json"
25+
unset root_catalog_uri
2026
else
21-
fixes_json=$(echo "${FIXES_TO_APPLY}" | sed "s/,\s*/\", \"/g")
22-
echo "{\"fixes_to_apply\": [\"${fixes_json}\"]}" > $tmp_index_config_path
27+
# No point evaluating this if updating an existing index as it will be ignored.
28+
if [ -n "${FIXES_TO_APPLY}" ]; then
29+
export tmp_index_config_path=$(mktemp)
30+
fixes_json=$(echo "${FIXES_TO_APPLY}" | sed "s/,\s*/\", \"/g")
31+
echo "{\"fixes_to_apply\": [\"${fixes_json}\"]}" > $tmp_index_config_path
32+
fi
2333
fi
2434

25-
dco="docker compose -f docker-compose.base.yml -f docker-compose.remote-source.yml"
2635

36+
dco="docker compose -f docker-compose.base.yml -f docker-compose.remote-source.yml"
2737
$dco build
28-
echo; echo "* Indexing may take some time, depending on the size of the catalog *"; echo
29-
sleep 1
3038
$dco up --force-recreate

src/stac_fastapi/indexed/settings.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from functools import lru_cache
22
from typing import Optional
3+
from uuid import uuid4
34

45
from stac_fastapi.types.config import ApiSettings, SettingsConfigDict
56

@@ -10,7 +11,7 @@ class _Settings(ApiSettings):
1011
)
1112
log_level: str = "info"
1213
index_manifest_uri: str = "/index/manifest.json"
13-
token_jwt_secret: str
14+
token_jwt_secret: str = uuid4().hex
1415
duckdb_threads: Optional[int] = None
1516
deployment_root_path: Optional[str] = None
1617
install_duckdb_extensions: bool = (

0 commit comments

Comments
 (0)