Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle newline-delimited GeoJSON and large GeoJSON files #1154

Merged
merged 9 commits into from
Jan 24, 2025

Conversation

msbarry
Copy link
Contributor

@msbarry msbarry commented Jan 24, 2025

Switch from geotools geojson reader to a custom jackson-streaming based geojson parser that handles newline-delimited geojson and arbitrarily large input files using a minimal memory footprint.

To use it, you can add .addGeoJsonSource("name", Path.of("path", "to", "file.json"), "https//url/of/file.json") to a Planetiler.create instance or to use just the streaming parser:

var json = GeoJson.from(Path.of("file.json"));
GeoJsonFeature onlyFeature = json.stream().findFirst().get();
for (var feature : json) {
  process each feature...
}

The new parser will throw an exception and abort if the input json has a syntax error, but it will only log warnings and emit empty geometries if the input does not adhere to json semantics.

See RFC 7946 which appears to be the canonical geojson specification now, and this description of newline-delimited geojson format.

Testing on large geojson files converted from overture, it reads about 250k features/150MB per second with fixed memory usage <100MB. There's probably room for improvement from that, but it at least makes processing arbitrarily large files possible.

NOTE: crs was removed from the spec, so this parser assumes all geojson input files use WGS84 with longitude first, and latitude second. 3d coordinates are ignored.

Copy link

github-actions bot commented Jan 24, 2025

This Branch a41f3fd Base 6178df0
0:01:06 DEB [archive] - Tile stats:
0:01:06 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (157k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:85k)
2. 9/154/190 (144k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (135k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:71k)
5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:06 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   872   332   437   552   802  1.6k    2k  6.9k  6.2k  5.6k  4.5k  6.9k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.8k  3.4k  1.7k   803   948  5.8k
            landuse    0     0     0     0   549   695  1.6k  6.7k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   313   776  1.2k    4k  5.6k   17k   13k   17k   62k   47k   33k   62k
           waterway    0     0     0     0   112   119     0     0     0    3k  2.3k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.3k  4.3k  9.7k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   287   364  1.1k  1.9k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.9k   29k   85k   71k   81k   53k   30k   25k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   568   565   85k   85k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.4k  3.7k    6k   20k   41k   82k  195k  181k  134k  113k  127k  247k  247k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   59k  144k  136k   98k   83k   91k  157k  157k
0:01:06 DEB [archive] -    Max tile: 247k (gzipped: 157k)
0:01:06 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:06 DEB [archive] -     # tiles: 4,115,039
0:01:06 DEB [archive] -  # features: 5,519,402
0:01:06 INF [archive] - Finished in 19s cpu:1m11s avg:3.7
0:01:06 INF [archive] -   read    1x(3% 0.6s wait:17s done:1s)
0:01:06 INF [archive] -   encode  4x(56% 11s wait:2s done:1s)
0:01:06 INF [archive] -   write   1x(22% 4s wait:13s)
0:01:06 INF [archive] - Finished in 1m7s cpu:3m39s gc:1s avg:3.3
0:01:06 INF [archive] - FINISHED!
0:01:06 INF [archive] - 
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - data errors:
0:01:06 INF [archive] - 	render_snap_fix_input	16,734
0:01:06 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:06 INF [archive] - 	osm_boundary_missing_way	55
0:01:06 INF [archive] - 	merge_snap_fix_input	12
0:01:06 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:06 INF [archive] - 	render_snap_fix_input2	1
0:01:06 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:06 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:06 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - 	overall          1m7s cpu:3m39s gc:1s avg:3.3
0:01:06 INF [archive] - 	lake_centerlines 3s cpu:6s avg:2.1
0:01:06 INF [archive] - 	  read     1x(18% 0.5s done:2s)
0:01:06 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:06 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:06 INF [archive] - 	water_polygons   15s cpu:41s avg:2.8
0:01:06 INF [archive] - 	  read     1x(41% 6s done:7s)
0:01:06 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:06 INF [archive] - 	  write    1x(4% 0.5s wait:9s done:5s)
0:01:06 INF [archive] - 	natural_earth    6s cpu:13s avg:2.1
0:01:06 INF [archive] - 	  read     1x(95% 6s)
0:01:06 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:06 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:06 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.2
0:01:06 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:06 INF [archive] - 	  parse    4x(33% 0.6s)
0:01:06 INF [archive] - 	  process  1x(70% 1s)
0:01:06 INF [archive] - 	osm_pass2        19s cpu:1m16s avg:3.9
0:01:06 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:06 INF [archive] - 	  process  4x(75% 14s)
0:01:06 INF [archive] - 	  write    1x(2% 0.4s wait:19s)
0:01:06 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:06 INF [archive] - 	boundaries       0s cpu:0s avg:1.3
0:01:06 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:06 INF [archive] - 	sort             1s cpu:4s avg:2.6
0:01:06 INF [archive] - 	  worker  1x(52% 0.7s)
0:01:06 INF [archive] - 	archive          19s cpu:1m11s avg:3.7
0:01:06 INF [archive] - 	  read    1x(3% 0.6s wait:17s done:1s)
0:01:06 INF [archive] - 	  encode  4x(56% 11s wait:2s done:1s)
0:01:06 INF [archive] - 	  write   1x(22% 4s wait:13s)
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - 	archive	108MB
0:01:06 INF [archive] - 	features	284MB
-rw-r--r-- 1 runner docker 87M Jan 24 13:28 run.jar
0:01:05 DEB [archive] - Tile stats:
0:01:05 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (157k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:85k)
2. 9/154/190 (144k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (135k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:71k)
5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:05 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   872   332   437   552   802  1.6k    2k  6.9k  6.2k  5.6k  4.5k  6.9k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.8k  3.4k  1.7k   803   948  5.8k
            landuse    0     0     0     0   549   695  1.6k  6.7k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   313   776  1.2k    4k  5.6k   17k   13k   17k   62k   47k   33k   62k
           waterway    0     0     0     0   112   119     0     0     0    3k  2.3k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.3k  4.3k  9.7k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   287   364  1.1k  1.9k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.9k   29k   85k   71k   81k   53k   30k   25k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   568   565   85k   85k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.4k  3.7k    6k   20k   41k   82k  195k  181k  134k  113k  127k  247k  247k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   59k  144k  136k   98k   83k   91k  157k  157k
0:01:05 DEB [archive] -    Max tile: 247k (gzipped: 157k)
0:01:05 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:05 DEB [archive] -     # tiles: 4,115,039
0:01:05 DEB [archive] -  # features: 5,519,402
0:01:05 INF [archive] - Finished in 19s cpu:1m11s avg:3.7
0:01:05 INF [archive] -   read    1x(3% 0.5s wait:17s done:1s)
0:01:05 INF [archive] -   encode  4x(57% 11s wait:2s done:1s)
0:01:05 INF [archive] -   write   1x(22% 4s wait:13s)
0:01:05 INF [archive] - Finished in 1m5s cpu:3m35s gc:1s avg:3.3
0:01:05 INF [archive] - FINISHED!
0:01:05 INF [archive] - 
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - data errors:
0:01:05 INF [archive] - 	render_snap_fix_input	16,734
0:01:05 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:05 INF [archive] - 	osm_boundary_missing_way	55
0:01:05 INF [archive] - 	merge_snap_fix_input	12
0:01:05 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:05 INF [archive] - 	render_snap_fix_input2	1
0:01:05 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:05 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	overall          1m5s cpu:3m35s gc:1s avg:3.3
0:01:05 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.4
0:01:05 INF [archive] - 	  read     1x(23% 0.5s done:2s)
0:01:05 INF [archive] - 	  process  4x(0% 0s done:1s)
0:01:05 INF [archive] - 	  write    1x(0% 0s done:1s)
0:01:05 INF [archive] - 	water_polygons   15s cpu:41s avg:2.7
0:01:05 INF [archive] - 	  read     1x(41% 6s done:7s)
0:01:05 INF [archive] - 	  process  4x(26% 4s wait:4s done:5s)
0:01:05 INF [archive] - 	  write    1x(3% 0.5s wait:9s done:5s)
0:01:05 INF [archive] - 	natural_earth    6s cpu:13s avg:2
0:01:05 INF [archive] - 	  read     1x(96% 6s)
0:01:05 INF [archive] - 	  process  4x(12% 0.8s wait:6s)
0:01:05 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:05 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.2
0:01:05 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:05 INF [archive] - 	  parse    4x(33% 0.7s)
0:01:05 INF [archive] - 	  process  1x(69% 1s)
0:01:05 INF [archive] - 	osm_pass2        19s cpu:1m14s avg:4
0:01:05 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:05 INF [archive] - 	  process  4x(76% 14s)
0:01:05 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:05 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:05 INF [archive] - 	boundaries       0s cpu:0s avg:1.4
0:01:05 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:05 INF [archive] - 	sort             1s cpu:3s avg:2.4
0:01:05 INF [archive] - 	  worker  1x(50% 0.7s)
0:01:05 INF [archive] - 	archive          19s cpu:1m11s avg:3.7
0:01:05 INF [archive] - 	  read    1x(3% 0.5s wait:17s done:1s)
0:01:05 INF [archive] - 	  encode  4x(57% 11s wait:2s done:1s)
0:01:05 INF [archive] - 	  write   1x(22% 4s wait:13s)
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	archive	108MB
0:01:05 INF [archive] - 	features	284MB
-rw-r--r-- 1 runner docker 88M Jan 24 13:29 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/12950545460

@msbarry msbarry merged commit 8cb867f into main Jan 24, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant