Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve osm_date to report date from PBF metadata #388

Open
2 of 3 tasks
rustprooflabs opened this issue May 23, 2024 · 1 comment
Open
2 of 3 tasks

Improve osm_date to report date from PBF metadata #388

rustprooflabs opened this issue May 23, 2024 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@rustprooflabs
Copy link
Owner

rustprooflabs commented May 23, 2024

Details

Current Status

Version 1.0.2

Version TBD

  • Updates with replication

Description

The osm.pgosm_flex table tracks the osm_date column which is intended to provide an idea of when the data itself was from. The current behavior defaults to "today's date" according to the computer running the import by default. Unless the --pgosm-date option is used, then it uses the date provided (e.g. 2024-05-18). Neither of these options is perfect, due to time zones and actual differences between when "I downloaded the file" vs "when the data was pulled from OSM."

I'd like to make the following changes.

  • Use osmium fileinfo to retrieve the timestamp from the pbf metdata when it exists
  • Fall back to current behavior when metadata missing
  • Consider changing osm_date to timestamptz instead of date

Example in DB

I ran an import on 5/18/2024 local time. The data saved in the osm.pgosm_flex table is shown by the following query.

SELECT imported, osm_date, region, pgosm_flex_version
    FROM osm.pgosm_flex
;
imported                     |osm_date  |region                   |pgosm_flex_version|
-----------------------------+----------+-------------------------+------------------+
2024-05-18 08:25:03.747 -0600|2024-05-18|north-america/us-colorado|1.0.0-c946501     |

PBF metadata

The timestamp from the pbf's metadata is reported as 2024-05-17T20:20:59Z, which is 2024-05-17T14:20:59 MDT local time for me. The date reported in the current method is reported to be a day later when the data was actually sourced.

We should be able to run this command returning the JSON into python as a dict to extract the timestamp and/or osmosis_replication_timestamp keys.

osmium fileinfo district-of-columbia-2024-05-18.osm.pbf --json
{
    "file": {
        "name": "district-of-columbia-2024-05-18.osm.pbf",
        "format": "PBF",
        "compression": "none",
        "size": 19026604
    },
    "header": {
        "boxes": [
            [
                -77.1201,
                38.79134,
                -76.90906,
                38.99603
            ]
        ],
        "with_history": false,
        "option": {
            "generator": "osmium/1.14.0",
            "osmosis_replication_base_url": "http://download.geofabrik.de/north-america/us/district-of-columbia-updates",
            "osmosis_replication_sequence_number": "4066",
            "osmosis_replication_timestamp": "2024-05-17T20:20:59Z",
            "pbf_dense_nodes": "true",
            "pbf_optional_feature_0": "Sort.Type_then_ID",
            "sorting": "Type_then_ID",
            "timestamp": "2024-05-17T20:20:59Z"
        }
    }
}
@rustprooflabs rustprooflabs added the enhancement New feature or request label May 23, 2024
@rustprooflabs rustprooflabs added this to the 1.0.2 milestone May 23, 2024
@jacopofar
Copy link
Contributor

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants