Skip to content

Improve osm_date to report date from PBF metadata #388

@rustprooflabs

Description

@rustprooflabs

Details

Current Status

Version 1.0.2

Version TBD

  • Updates with replication

Description

The osm.pgosm_flex table tracks the osm_date column which is intended to provide an idea of when the data itself was from. The current behavior defaults to "today's date" according to the computer running the import by default. Unless the --pgosm-date option is used, then it uses the date provided (e.g. 2024-05-18). Neither of these options is perfect, due to time zones and actual differences between when "I downloaded the file" vs "when the data was pulled from OSM."

I'd like to make the following changes.

  • Use osmium fileinfo to retrieve the timestamp from the pbf metdata when it exists
  • Fall back to current behavior when metadata missing
  • Consider changing osm_date to timestamptz instead of date

Example in DB

I ran an import on 5/18/2024 local time. The data saved in the osm.pgosm_flex table is shown by the following query.

SELECT imported, osm_date, region, pgosm_flex_version
    FROM osm.pgosm_flex
;
imported                     |osm_date  |region                   |pgosm_flex_version|
-----------------------------+----------+-------------------------+------------------+
2024-05-18 08:25:03.747 -0600|2024-05-18|north-america/us-colorado|1.0.0-c946501     |

PBF metadata

The timestamp from the pbf's metadata is reported as 2024-05-17T20:20:59Z, which is 2024-05-17T14:20:59 MDT local time for me. The date reported in the current method is reported to be a day later when the data was actually sourced.

We should be able to run this command returning the JSON into python as a dict to extract the timestamp and/or osmosis_replication_timestamp keys.

osmium fileinfo district-of-columbia-2024-05-18.osm.pbf --json
{
    "file": {
        "name": "district-of-columbia-2024-05-18.osm.pbf",
        "format": "PBF",
        "compression": "none",
        "size": 19026604
    },
    "header": {
        "boxes": [
            [
                -77.1201,
                38.79134,
                -76.90906,
                38.99603
            ]
        ],
        "with_history": false,
        "option": {
            "generator": "osmium/1.14.0",
            "osmosis_replication_base_url": "http://download.geofabrik.de/north-america/us/district-of-columbia-updates",
            "osmosis_replication_sequence_number": "4066",
            "osmosis_replication_timestamp": "2024-05-17T20:20:59Z",
            "pbf_dense_nodes": "true",
            "pbf_optional_feature_0": "Sort.Type_then_ID",
            "sorting": "Type_then_ID",
            "timestamp": "2024-05-17T20:20:59Z"
        }
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions