Skip to content

AB#62209: Add local backup and error handling into prepare_data.sh #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 21, 2025

Conversation

e-halinen
Copy link
Contributor

No description provided.

Copy link

@haphut haphut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start at making this thing more reliable!

Note, I have not delved deep into the larger context of this script and what is the process and data flow of hsl-map-matcher. Some thoughts anyway:

  1. Bash is great for one-off operations, for combining tools with pipes or maaaybe for file operations.

    Once a Bash script reaches 20-50 lines, I would consider migrating to languages with a more friendly developer experience. In this case, TypeScript.

    The script downloads a new data set and runs binaries against it. This logic can be done in TypeScript. That would take more development work right now, though.

  2. If you think we should continue with a Bash script for now, I would change the following:

    1. Add the great default settings, set -Eeuo pipefail, to the top. Maybe then just do not handle errors and let the script fail on any error, depending on how the TypeScript wrapper reacts to that.
    2. Create a temporary directory, set a trap to remove it and download the new data and process it in a subdirectory of the temporary directory. If everything goes nicely i.e. the script does not end prematurely and with a non-zero exit code, replace the previous directory with the new one. This way there is a working copy of the data always available.

    Pseudocode:

    #!/bin/bash
    
    set -Eeuo pipefail
    
    readonly OSM_DATA_URL="${OSM_DATA_URL}"
    readonly DATA_DIR='data'
    readonly PROFILE_SOURCE_DIR='./osrm-profiles'
    readonly OSM_DATA_FILE='map-data.osm.pbf'
    readonly OSRM_BIN_PATH='./node_modules/@project-osrm/osrm/lib/binding'
    
    tmp_dir="$(mktemp -d)"
    trap 'rm -rf "${tmp_dir}"' EXIT
    
    NEW_DATA_DIR="${tmp_dir}/new-data-dir"
    
    # curl into NEW_DATA_DIR
    # run osrm binaries against NEW_DATA_DIR
    # "${ORSM_BIN_PATH} blah-blah
    
    # Two options :
    #
    ## Easier but not atomic:
    # rm -rf "${DATA_DIR}"
    # mv "${NEW_DATA_DIR}" "${DATA_DIR}"
    #
    ## Harder but atomic
    # 1. Move NEW_DATA_DIR out of /tmp similary to above.
    # 2. Create a symlink (`ln -s`) or move the existing symlink (`mv -T`) to
    #    point to the latest data directory. That's atomic.
    # 3. Remove the older directory. Maybe use ISO 8601 timestamps
    #    (`date --utc '+%Y%m%dT%H%M%SZ'`) in the directory names to recognize the
    #    latest directory.

Copy link

@haphut haphut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@e-halinen e-halinen merged commit ccd61c0 into dev Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants