Skip to content

Commit a09d8f0

Browse files
committed
📝 Add file formats for geodata
1 parent d393491 commit a09d8f0

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed

docs/data-processing/geodata.rst

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,112 @@
55
Geodata
66
=======
77

8+
File formats
9+
------------
10+
11+
.. _pmtiles:
12+
13+
PMTiles
14+
~~~~~~~
15+
16+
`PMTiles <https://docs.protomaps.com>`_ is a general format for tile data
17+
addressed by Z/X/Y coordinates. This can be cartographic vector tiles,
18+
:ref:`remote sensing data <remote-sensing>`, JPEG images or similar.
19+
20+
`HTTP Range Requests
21+
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests>`_ are used
22+
for reading in order to retrieve only the relevant tiles or metadata within a
23+
PMTiles archive. The arrangement of tiles and directories is designed to
24+
minimise the number of requests when moving and zooming.
25+
26+
However, PMTiles is a read-only format: it is not possible to update part of the
27+
archive without rewriting the entire file. If you need transactional updates,
28+
you should use a database such as SQLite or :doc:`postgresql/postgis/index` and
29+
`ST_asMVT <https://postgis.net/docs/ST_AsMVT.html>`_.
30+
31+
.. seealso::
32+
* `GitHub Repository <https://github.com/protomaps/PMTiles>`_
33+
* `PMTiles Version 3 Specification
34+
<https://github.com/protomaps/PMTiles/blob/main/spec/v3/spec.md>`_
35+
* `pmtiles Python package
36+
<https://github.com/protomaps/PMTiles/tree/main/python/pmtiles>`_
37+
38+
Mapbox Vector Tiles (MVT)
39+
~~~~~~~~~~~~~~~~~~~~~~~~~
40+
41+
The `Mapbox Vector Tiles
42+
<https://docs.mapbox.com/data/tilesets/guides/vector-tiles-standards/>`_ file
43+
format stores each tile in a directory tree like :file:`/Z/X/Y.mvt`. This works
44+
well for small tile sets, but updating an entire global pyramid of ~300 million
45+
tiles is very inefficient. :ref:`pmtiles`, on the other hand, is a single file
46+
with tiles de-duplicated, reducing the size of global vector basemaps by ~70%.
47+
48+
For writing, the :ref:`gdal` library with `SQLite <https://www.sqlite.org>`_ and
49+
`GEOS <https://libgeos.org>`_ support must be installed. The :ref:`mbtiles` are
50+
stored in SQLite like mbtiles and can be processed with the MBTiles driver.
51+
52+
.. seealso::
53+
* `Mapbox Vector Tile specification
54+
<https://github.com/mapbox/vector-tile-spec>`_
55+
* `MVT: Mapbox Vector Tiles
56+
<https://gdal.org/en/stable/drivers/vector/mvt.html>`_
57+
58+
.. _mbtiles:
59+
60+
MBTiles
61+
~~~~~~~
62+
63+
`MBTiles <https://docs.mapbox.com/help/glossary/mbtiles/>`_ is a container
64+
format for tile data based on SQLite. It is optimised for local access, not for
65+
access via HTTP like :ref:`pmtiles`.
66+
67+
.. seealso::
68+
* `MBTiles specification <https://github.com/mapbox/mbtiles-spec>`_
69+
70+
.. _geodata-repositories:
71+
72+
Cloud Optimized GeoTIFF (COG)
73+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74+
75+
`Cloud Optimized GeoTIFF <https://cogeo.org>`_ is a raster TIFF file that, like
76+
:ref:`pmtiles`, is optimised for reading from a cloud storage. :ref:`pmtiles`
77+
can also deliver other tile data, for example vector tiles. However, COG is
78+
backwards compatible with most GIS programmes that work with GeoTIFF.
79+
80+
.. seealso::
81+
* `OGC Cloud Optimized GeoTIFF Standard
82+
<https://docs.ogc.org/is/21-026/21-026.html>`_
83+
84+
.. _geoparquet:
85+
86+
GeoParquet
87+
~~~~~~~~~~
88+
89+
`Parquet <https://parquet.apache.org>`_ is an open-source, column-orientated
90+
data file format that was developed for the efficient storage and retrieval of
91+
data. It offers efficient data compression and encoding methods with optimised
92+
processing of large, complex data. `GeoParquet <https://geoparquet.org>`_
93+
extends Parquet with interoperable geodata types (point, line, polygon).
94+
95+
96+
* :doc:`pyviz:matplotlib/geopandas/index` supports the `reading
97+
<https://geopandas.org/en/stable/docs/reference/api/geopandas.read_parquet.html>`_
98+
and `writing
99+
<https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_parquet.html>`_
100+
of GeoParquet.
101+
* `GeoParquet Downloader Plugin
102+
<https://plugins.qgis.org/plugins/qgis_plugin_gpq_downloader/>`_ for `QGIS
103+
<https://qgis.org>`_ enables streaming downloads of large GeoParquet datasets.
104+
* `DuckDB <https://duckdb.org>`_ allows the reading and writing of GeoParquet
105+
files with the `Spatial Extension
106+
<https://duckdb.org/docs/stable/extensions/spatial/overview.html>`_.
107+
108+
.. seealso::
109+
* `GeoParquet specification <https://github.com/opengeospatial/geoparquet>`_
110+
* `GeoParquet Software <https://geoparquet.org/#implementations>`_
111+
* `validate_geoparquet.py
112+
<https://github.com/OSGeo/gdal/blob/master/swig/python/gdal-utils/osgeo_utils/samples/validate_geoparquet.py>`_
113+
8114
.. _geodata-repositories:
9115

10116
Data repositories
@@ -30,6 +136,8 @@ Software
30136
Reading and writing
31137
~~~~~~~~~~~~~~~~~~~
32138

139+
.. _gdal:
140+
33141
`Geospatial Data Abstraction Library (GDAL) <https://gdal.org/en/latest/>`_
34142
provides a low-level but more powerful API for reading and writing hundreds
35143
of data formats.
@@ -137,6 +245,8 @@ Reading and writing
137245
.. seealso::
138246
:ref:`geo-wrappers`
139247

248+
.. _remote-sensing:
249+
140250
Remote sensing
141251
~~~~~~~~~~~~~~
142252

0 commit comments

Comments
 (0)