-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEOMESA-3259 FSDS - Add support for GeoParquet #3064
base: main
Are you sure you want to change the base?
Conversation
adeet1
commented
Mar 20, 2024
•
edited
Loading
edited
- Create a bounding box for each geometry, and add it to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure)
- Read and write all geometry attributes as binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility
- Add support for parsing WKB bytes in the Parquet geometry transformer functions
- Use a spatial index instead of a GeoTools filter for bounding box queries
To-do items:
|
...parquet/src/main/scala/org/locationtech/geomesa/convert/parquet/ParquetFunctionFactory.scala
Outdated
Show resolved
Hide resolved
...t-parquet/src/test/scala/org/locationtech/geomesa/convert/parquet/ParquetConverterTest.scala
Outdated
Show resolved
Hide resolved
...on/src/main/scala/org/locationtech/geomesa/fs/storage/common/AbstractFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...src/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureReadSupport.scala
Outdated
Show resolved
Hide resolved
...src/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureReadSupport.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureWriteSupport.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureWriteSupport.scala
Outdated
Show resolved
Hide resolved
...arquet/src/main/scala/org/locationtech/geomesa/convert/parquet/ParquetConverterFactory.scala
Outdated
Show resolved
Hide resolved
...age-parquet/src/main/scala/org/locationtech/geomesa/fs/storage/parquet/FilterConverter.scala
Outdated
Show resolved
Hide resolved
...et/src/main/scala/org/locationtech/geomesa/fs/storage/parquet/ParquetFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
...et/src/main/scala/org/locationtech/geomesa/fs/storage/parquet/ParquetFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
.../src/main/scala/org/locationtech/geomesa/fs/storage/parquet/SimpleFeatureParquetWriter.scala
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureParquetSchema.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/locationtech/geomesa/fs/storage/parquet/io/SimpleFeatureWriteSupport.scala
Outdated
Show resolved
Hide resolved
...esa-fs-storage/geomesa-fs-storage-parquet/src/test/resources/geoparquet-metadata-schema.json
Outdated
Show resolved
Hide resolved
...on/src/main/scala/org/locationtech/geomesa/fs/storage/common/AbstractFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When we compact GeoParquet files in a filesystem partition, we need to ensure that the bounding boxes in the metadata of the files get merged correctly (i.e. assert that the union of bounding boxes of the files before compaction is equal to the union of bounding boxes of the newly compacted files).
commit 0ea8bff Author: adeet1 <[email protected]> Date: Fri Mar 29 20:29:40 2024 +0000 Optimize imports commit 9ebd85a Author: adeet1 <[email protected]> Date: Fri Mar 29 20:12:03 2024 +0000 Initialize bounds as an empty array instead of null * This fixes a failing unit test "suppress or allow empty output files" in ExportCommandTest.scala commit 4cff76a Author: adeet1 <[email protected]> Date: Fri Mar 29 15:18:09 2024 +0000 Split Parquet and Orc file compaction tests in order to differentiate the comparisons commit 16d88fd Author: adeet1 <[email protected]> Date: Wed Mar 27 20:48:07 2024 +0000 Assert in each partition that GeoParquet metadata bounding boxes across files are correctly merged upon compaction * Write features with different geometries and coordinates, so we can test the merging of unique bounding boxes. commit 4197e4d Author: adeet1 <[email protected]> Date: Thu Mar 28 21:27:17 2024 +0000 Change thunk to lazy vals commit 4eaf9fc Author: adeet1 <[email protected]> Date: Thu Mar 28 20:22:10 2024 +0000 Implement methods instead of lazy vals commit c82c0d2 Author: adeet1 <[email protected]> Date: Thu Mar 28 20:13:56 2024 +0000 Move test scope commit 09588e8 Author: adeet1 <[email protected]> Date: Thu Mar 28 20:01:00 2024 +0000 Don't create a GeoParquet metadata string if the SFT has no geometries commit 137dcb5 Author: adeet1 <[email protected]> Date: Thu Mar 28 19:36:31 2024 +0000 Re-implement GeoParquet metadata logic to work for SFTs with multiple geometries commit 360c2c7 Author: adeet1 <[email protected]> Date: Thu Mar 28 16:58:26 2024 +0000 Change back to GroupReadSupport * This simply checks if the Parquet file is valid - it won't deserialize/manifest everything and thus saves us some processing commit 3bce59e Author: adeet1 <[email protected]> Date: Thu Mar 28 14:39:34 2024 +0000 Use the released GeoParquet metadata schema, not the dev one commit 878abb5 Author: adeet1 <[email protected]> Date: Thu Mar 28 14:30:35 2024 +0000 Optimize imports commit d49fc3a Author: adeet1 <[email protected]> Date: Wed Mar 27 14:47:54 2024 +0000 Assert that the bounding box in the GeoParquet metadata is correct commit 2ae9574 Author: adeet1 <[email protected]> Date: Tue Mar 26 23:14:46 2024 +0000 Instantiate the observer directly in SimpleFeatureWriteSupport instead of passing it down from SimpleFeatureParquetWriter commit 9770a3a Author: adeet1 <[email protected]> Date: Fri Mar 22 14:09:05 2024 +0000 Tweak targetSize commit 604e614 Author: adeet1 <[email protected]> Date: Wed Mar 20 19:55:59 2024 +0000 Assert that the file metadata adheres to the GeoParquet metadata json schema commit 2257d6c Author: adeet1 <[email protected]> Date: Thu Mar 21 22:03:29 2024 +0000 Deprecate the ParquetFunctionFactory class, but provide backwards compatibility commit 03e699f Author: adeet1 <[email protected]> Date: Thu Mar 21 20:04:43 2024 +0000 Create a new metadata map instance when adding bounding box commit 8630eed Author: adeet1 <[email protected]> Date: Thu Mar 21 18:07:30 2024 +0000 Change BoundsObserver argument back to FileSystemObserver commit 921274b Author: adeet1 <[email protected]> Date: Thu Mar 21 17:53:38 2024 +0000 If the sft has no geometry field, then omit the GeoParquet metadata entirely commit c1dda99 Author: adeet1 <[email protected]> Date: Thu Mar 21 17:51:26 2024 +0000 Omit orientation, edges and epoch commit dabdc43 Author: adeet1 <[email protected]> Date: Thu Mar 21 17:39:47 2024 +0000 Make variables private to avoid exposing mutable state outside the scope of the class commit 5eecf48 Author: adeet1 <[email protected]> Date: Thu Mar 21 17:32:01 2024 +0000 Delete redundant checks in geometry read and write support commit 0ed5c65 Author: adeet1 <[email protected]> Date: Thu Mar 21 14:55:29 2024 +0000 Delete duplicate dependency commit 3dc798d Author: adeet1 <[email protected]> Date: Wed Mar 20 19:09:44 2024 +0000 Support backwards compatibility for FilterConverter commit 7dea125 Author: adeet1 <[email protected]> Date: Wed Mar 20 15:32:31 2024 +0000 Delete .parquet.crc file after running tests commit 652bf3a Author: Adeet Patel <[email protected]> Date: Mon Feb 12 12:16:35 2024 -0500 GEOMESA-3259 FSDS - Add support for GeoParquet * Create a BoundsObserver trait, and tweak various classes and methods to use that trait * Add an observer to the SimpleFeatureParquetWriter and write records to it, in order to create a bounding box of all the geometries. Add this bounding box to the GeoParquet metadata (which requires the metadata map to be changed to a mutable data structure). * Read/write all geometry attributes in binary (a primitive Parquet type) instead of as a pair of x/y doubles (a group Parquet type), using the same converter and attribute writer for all geometry types, while also maintaining backwards compatibility * Add support for parsing WKB bytes in the Parquet geometry transformer functions * Exclude bounding box from the GeoTools filter and use a spatial index instead Co-authored-by: Emilio Lahr-Vivaz <[email protected]>
...on/src/main/scala/org/locationtech/geomesa/fs/storage/common/AbstractFileSystemStorage.scala
Outdated
Show resolved
Hide resolved
With the release of geoparquet 1.1.0, we don't need to encode everything as WKB any more, our regular encoding matches (or almost matches in some cases) the "native" geoparquet encoding. It may still be useful to support WKB encoding, though, as "native" encoding support doesn't seem widely adopted yet. |