adopt curl -f to make the data loader fail if curl fails to download (#1667)

Fil · web-flow · commit 28b857cbc299 · 2024-09-23T15:53:21.000+02:00
diff --git a/docs/data-loaders.md b/docs/data-loaders.md
@@ -16,7 +16,7 @@ Data loaders are polyglot: they can be written in any programming language. They
 A data loader can be as simple as a shell script that invokes [curl](https://curl.se/) to fetch recent earthquakes from the [USGS](https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php):
 
 ```sh
-curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
+curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
 ```
 
 Data loaders use [file-based routing](#routing), so assuming this shell script is named `quakes.json.sh`, a `quakes.json` file is then generated at build time. You can access this file from the client using [`FileAttachment`](./files):
@@ -230,7 +230,7 @@ If multiple requests are made concurrently for the same data loader, the data lo
 
 ## Output
 
-Data loaders must output to [standard output](<https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stdout)>). The first extension (such as `.csv`) does not affect the generated snapshot; the data loader is solely responsible for producing the expected output (such as CSV). If you wish to log additional information from within a data loader, be sure to log to standard error, say by using [`console.warn`](https://developer.mozilla.org/en-US/docs/Web/API/console/warn) or `process.stderr`; otherwise the logs will be included in the output file and sent to the client.
+Data loaders must output to [standard output](<https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stdout)>). The first extension (such as `.csv`) does not affect the generated snapshot; the data loader is solely responsible for producing the expected output (such as CSV). If you wish to log additional information from within a data loader, be sure to log to standard error, say by using [`console.warn`](https://developer.mozilla.org/en-US/docs/Web/API/console/warn) or `process.stderr`; otherwise the logs will be included in the output file and sent to the client. If you use `curl` as above, we recommend the `-f` flag (equivalently, the `--fail` option) to make the data loader return an error when the download fails.
 
 ## Building
 
@@ -247,7 +247,7 @@ Data loaders generate files at build time that live alongside other [static file
 Where `quakes.json.sh` is:
 
 ```sh
-curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
+curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
 ```
 
 This will produce the following output root:
diff --git a/docs/data/dft-road-collisions.csv.sh b/docs/data/dft-road-collisions.csv.sh
@@ -5,7 +5,7 @@ TMPDIR="docs/.observablehq/cache/"
 
 # Download the data (if it’s not already in the cache).
 if [ ! -f "$TMPDIR/dft-collisions.csv" ]; then
-  curl "$URL" -o "$TMPDIR/dft-collisions.csv"
+  curl -f "$URL" -o "$TMPDIR/dft-collisions.csv"
 fi
 
 # Generate a CSV file using DuckDB.
diff --git a/docs/quakes.json.sh b/docs/quakes.json.sh
@@ -1 +1 @@
-curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
+curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson
diff --git a/examples/eia/src/data/eia-system-points.json.sh b/examples/eia/src/data/eia-system-points.json.sh
@@ -1,4 +1,4 @@
-curl 'https://www.eia.gov/electricity/930-api//respondents/data?type\[0\]=BA&type\[1\]=BR' \
+curl -f 'https://www.eia.gov/electricity/930-api//respondents/data?type\[0\]=BA&type\[1\]=BR' \
   -H 'Connection: keep-alive' \
   -A 'Chrome/123.0.0.0' \
   --compressed
diff --git a/examples/loader-census/src/data/ca.json.sh b/examples/loader-census/src/data/ca.json.sh
@@ -1,6 +1,6 @@
 # Download the ZIP archive from the Census Bureau (if needed).
 if [ ! -f src/.observablehq/cache/cb_2023_06_cousub_500k.zip ]; then
-  curl -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip'
+  curl -f -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip'
 fi
 
 # Unzip the ZIP archive to extract the shapefile.
diff --git a/examples/loader-census/src/index.md b/examples/loader-census/src/index.md
@@ -13,7 +13,7 @@ Next, here’s a bash script, `ca.json.sh`:
 ```bash
 # Download the ZIP archive from the Census Bureau (if needed).
 if [ ! -f src/.observablehq/cache/cb_2023_06_cousub_500k.zip ]; then
-  curl -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip'
+  curl -f -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip'
 fi
 
 # Unzip the ZIP archive to extract the shapefile.
diff --git a/examples/loader-duckdb/src/educ_uoe_lang01.parquet.sh b/examples/loader-duckdb/src/educ_uoe_lang01.parquet.sh
@@ -6,7 +6,7 @@ TMPDIR="src/.observablehq/cache/"
 
 # Download the data (if it’s not already in the cache).
 if [ ! -f "$TMPDIR/$CODE.csv" ]; then
-  curl "$URL" -o "$TMPDIR/$CODE.csv"
+  curl -f "$URL" -o "$TMPDIR/$CODE.csv"
 fi
 
 # Generate a Parquet file using DuckDB.
diff --git a/examples/loader-duckdb/src/index.md b/examples/loader-duckdb/src/index.md
@@ -11,7 +11,7 @@ TMPDIR="src/.observablehq/cache/"
 
 # Download the data (if it’s not already in the cache).
 if [ ! -f "$TMPDIR/$CODE.csv" ]; then
-  curl "$URL" -o "$TMPDIR/$CODE.csv"
+  curl -f "$URL" -o "$TMPDIR/$CODE.csv"
 fi
 
 # Generate a Parquet file using DuckDB.

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson`
	`1`	`+curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson`