|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Apache Arrow Rust 57.0.0 Release" |
| 4 | +date: "2025-10-30 00:00:00" |
| 5 | +author: pmc |
| 6 | +categories: [release] |
| 7 | +--- |
| 8 | +<!-- |
| 9 | +{% comment %} |
| 10 | +Licensed to the Apache Software Foundation (ASF) under one or more |
| 11 | +contributor license agreements. See the NOTICE file distributed with |
| 12 | +this work for additional information regarding copyright ownership. |
| 13 | +The ASF licenses this file to you under the Apache License, Version 2.0 |
| 14 | +(the "License"); you may not use this file except in compliance with |
| 15 | +the License. You may obtain a copy of the License at |
| 16 | +
|
| 17 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 18 | +
|
| 19 | +Unless required by applicable law or agreed to in writing, software |
| 20 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 21 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 22 | +See the License for the specific language governing permissions and |
| 23 | +limitations under the License. |
| 24 | +{% endcomment %} |
| 25 | +--> |
| 26 | + |
| 27 | +The Apache Arrow team is pleased to announce that the v57.0.0 release of Apache Arrow |
| 28 | +Rust is now available on crates.io ([arrow] and [parquet]) and as [source download]. |
| 29 | + |
| 30 | +[arrow]: https://crates.io/crates/arrow |
| 31 | +[parquet]: https://crates.io/crates/parquet |
| 32 | +[source download]: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-57.0.0 |
| 33 | + |
| 34 | +See the [57.0.0 changelog] for a full list of changes. |
| 35 | + |
| 36 | +[57.0.0 changelog]: https://github.com/apache/arrow-rs/blob/57.0.0/CHANGELOG.md |
| 37 | + |
| 38 | + |
| 39 | +## New Features |
| 40 | + |
| 41 | +Note: Arrow Rust hosts the development of the [parquet] crate, a high |
| 42 | +performance Rust implementation of [Apache Parquet]. |
| 43 | + |
| 44 | +### Performance: 4x Faster Parquet Metadata Parsing 🚀 |
| 45 | + |
| 46 | +Ed Seidl ([@etseidl]) and Jörn Horstmann ([@jhorstmann]) contributed a rewritten |
| 47 | +thrift metadata parser for Parquet files which is almost 4x faster than the |
| 48 | +previous parser based on the `thrift` crate. This is especially exciting for low |
| 49 | +latency use cases and reading Parquet files with large amounts of metadata (e.g. |
| 50 | +many row groups or columns). |
| 51 | +See the [blog post about the new Parquet metadata parser] for more details. |
| 52 | + |
| 53 | +<div style="display: flex; gap: 16px; justify-content: center; align-items: flex-start;"> |
| 54 | + <img src="{{ site.baseurl }}/img/rust-parquet-metadata/results.png" width="100%" class="img-responsive" alt="" aria-hidden="true"> |
| 55 | +</div> |
| 56 | + |
| 57 | +*Figure 1:* Performance improvements of [Apache Parquet] metadata parsing between version `56.2.0` and `57.0.0`. |
| 58 | + |
| 59 | + |
| 60 | +[Apache Parquet]: https://parquet.apache.org/ |
| 61 | +[@etseidl]: https://github.com/etseidl |
| 62 | +[@jhorstmann]: https://github.com/jhorstmann |
| 63 | + |
| 64 | +[blog post about the new Parquet metadata parser]: https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/ |
| 65 | + |
| 66 | +### New `arrow-avro` Crate |
| 67 | + |
| 68 | +The `57.0.0` release introduces a new [`arrow-avro`] crate contributed by [@jecsand838] |
| 69 | +and [@nathaniel-d-ef] that provides much more efficient conversion between |
| 70 | +[Apache Avro](https://avro.apache.org/) and Arrow `RecordBatch`es, as well as broader feature support. |
| 71 | + |
| 72 | +Previously, Arrow‑based systems that read or wrote Avro data |
| 73 | +typically used the general‑purpose [apache-avro] crate. While mature and |
| 74 | +feature‑complete, its row-oriented API does not support features such as |
| 75 | +projection pushdown or vectorized execution. The new `arrow-avro` crate supports |
| 76 | +these features efficiently by converting Avro data directly into Arrow's |
| 77 | +columnar format. |
| 78 | + |
| 79 | +See the [blog post about adding arrow-avro] for more details. |
| 80 | + |
| 81 | +<div style="display: flex; gap: 16px; justify-content: center; align-items: flex-start; padding: 20px 15px;"> |
| 82 | +<img src="{{ site.baseurl }}/img/introducing-arrow-avro/arrow-avro-architecture.svg" |
| 83 | + width="100%" |
| 84 | + alt="High-level `arrow-avro` architecture" |
| 85 | + style="background:#fff"> |
| 86 | +</div> |
| 87 | + |
| 88 | +*Figure 2:* Architecture of the `arrow-avro` crate. |
| 89 | + |
| 90 | + |
| 91 | +[@jecsand838]: https://github.com/jecsand838 |
| 92 | +[@nathaniel-d-ef]: https://github.com/nathaniel-d-ef |
| 93 | +[apache-avro]: https://crates.io/crates/apache-avro |
| 94 | +[`arrow-avro`]: https://crates.io/crates/arrow-avro |
| 95 | + |
| 96 | +[blog post about adding arrow-avro]: https://arrow.apache.org/blog/2025/10/23/introducing-arrow-avro/ |
| 97 | + |
| 98 | + |
| 99 | +### Parquet Variant Support 🧬 |
| 100 | + |
| 101 | +The Apache Parquet project recently added a [new `Variant` type] for |
| 102 | +representing semi-structured data. The `57.0.0` release includes support for reading and |
| 103 | +writing both normal and shredded `Variant` values to and from Parquet files. It |
| 104 | +also includes [parquet-variant], a complete library for working with `Variant` |
| 105 | +values, [`VariantArray`] for working with arrays of `Variant` values in Apache |
| 106 | +Arrow, computation kernels for converting to/from JSON and Arrow types, |
| 107 | +extracting paths, and shredding values. |
| 108 | + |
| 109 | +[new `Variant` type]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md |
| 110 | +[`VariantArray`]: https://docs.rs/parquet/latest/parquet/variant/struct.VariantArray.html |
| 111 | +[parquet-variant]: https://crates.io/crates/parquet-variant |
| 112 | + |
| 113 | +```rust |
| 114 | + // Use the VariantArrayBuilder to build a VariantArray |
| 115 | +let mut builder = VariantArrayBuilder::new(3); |
| 116 | +builder.new_object().with_field("name", "Alice").finish(); // row 1: {"name": "Alice"} |
| 117 | +builder.append_value("such wow"); // row 2: "such wow" (a string) |
| 118 | +let array = builder.build(); |
| 119 | + |
| 120 | +// Since VariantArray is an ExtensionType, it needs to be converted |
| 121 | +// to an ArrayRef and Field with the appropriate metadata |
| 122 | +// before it can be written to a Parquet file |
| 123 | +let field = array.field("data"); |
| 124 | +let array = ArrayRef::from(array); |
| 125 | +// create a RecordBatch with the VariantArray |
| 126 | +let schema = Schema::new(vec![field]); |
| 127 | +let batch = RecordBatch::try_new(Arc::new(schema), vec![array])?; |
| 128 | + |
| 129 | +// Now you can write the RecordBatch to the Parquet file, as normal |
| 130 | +let file = std::fs::File::create("variant.parquet")?; |
| 131 | +let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?; |
| 132 | +writer.write(&batch)?; |
| 133 | +writer.close()?; |
| 134 | +``` |
| 135 | + |
| 136 | + |
| 137 | +This support is being integrated into query engines, such as |
| 138 | +[@friendlymatthew]'s [`datafusion-variant`] crate to integrate into DataFusion |
| 139 | +and [delta-rs]. While this support is still experimental, we believe the APIs |
| 140 | +are mostly complete and do not expect major changes. Please consider trying |
| 141 | +it out and providing feedback and improvements. |
| 142 | + |
| 143 | +[`datafusion-variant`]: https://github.com/datafusion-contrib/datafusion-variant |
| 144 | +[delta-rs]: https://github.com/delta-io/delta-rs/issues/3637 |
| 145 | + |
| 146 | +Thanks to the many contributors who made this possible, including: |
| 147 | +* Ryan Johnson ([@scovich]), Congxian Qiu ([@klion26]), and Liam Bao ([@liamzwbao]) for completing the implementation |
| 148 | +* Li Jiaying ([@PinkCrow007]), Aditya Bhatnagar ([@carpecodeum]), and Malthe Karbo ([@mkarbo]) for |
| 149 | +initiating the work |
| 150 | +* Everyone else who has contributed, including [@superserious-dev], [@friendlymatthew], [@micoo227], [@Weijun-H], |
| 151 | + [@harshmotw-db], [@odysa], [@viirya], [@adriangb], [@kosiew], [@codephage2020], |
| 152 | + [@ding-young], [@mbrobbel], [@petern48], [@sdf-jkl], [@abacef], and [@mprammer]. |
| 153 | + |
| 154 | +[@PinkCrow007]: https://github.com/PinkCrow007 |
| 155 | +[@mkarbo]: https://github.com/mkarbo |
| 156 | +[@carpecodeum]: https://github.com/carpecodeum |
| 157 | +[@scovich]: https://github.com/scovich |
| 158 | +[@superserious-dev]: https://github.com/superserious-dev |
| 159 | +[@friendlymatthew]: https://github.com/friendlymatthew |
| 160 | +[@micoo227]: https://github.com/micoo227 |
| 161 | +[@Weijun-H]: https://github.com/Weijun-H |
| 162 | +[@harshmotw-db]: https://github.com/harshmotw-db |
| 163 | +[@odysa]: https://github.com/odysa |
| 164 | +[@viirya]: https://github.com/viirya |
| 165 | +[@klion26]: https://github.com/klion26 |
| 166 | +[@adriangb]: https://github.com/adriangb |
| 167 | +[@kosiew]: https://github.com/kosiew |
| 168 | +[@liamzwbao]: https://github.com/liamzwbao |
| 169 | +[@codephage2020]: https://github.com/codephage2020 |
| 170 | +[@ding-young]: https://github.com/ding-young |
| 171 | +[@mbrobbel]: https://github.com/mbrobbel |
| 172 | +[@petern48]: https://github.com/petern48 |
| 173 | +[@sdf-jkl]: https://github.com/sdf-jkl |
| 174 | +[@abacef]: https://github.com/abacef |
| 175 | +[@mprammer]: https://github.com/mprammer |
| 176 | + |
| 177 | +See the ticket [Variant type support in Parquet #6736] for more details |
| 178 | + |
| 179 | + |
| 180 | +[Variant type support in Parquet #6736]: https://github.com/apache/arrow-rs/issues/6736 |
| 181 | + |
| 182 | + |
| 183 | +### Parquet Geometry Support 🗺️ |
| 184 | + |
| 185 | + |
| 186 | +The `57.0.0` release also includes support for reading and writing [Parquet Geometry |
| 187 | +types], `GEOMETRY` and `GEOGRAPHY`, including `GeospatialStatistics` |
| 188 | +contributed by Kyle Barron ([@kylebarron]), Dewey Dunnington ([@paleolimbot]), |
| 189 | +Kaushik Srinivasan ([@kaushiksrini]), and Blake Orth ([@BlakeOrth]). |
| 190 | + |
| 191 | +Please see the [Implement Geometry and Geography type support in Parquet] tracking ticket for more details. |
| 192 | + |
| 193 | +[@kylebarron]: https://github.com/kylebarron |
| 194 | +[@paleolimbot]: https://github.com/paleolimbot |
| 195 | +[@kaushiksrini]: https://github.com/kaushiksrini |
| 196 | +[@BlakeOrth]: https://github.com/BlakeOrth |
| 197 | + |
| 198 | +[Parquet Geometry types]: https://github.com/apache/parquet-format/blob/master/Geospatial.md |
| 199 | + |
| 200 | + |
| 201 | +[Implement Geometry and Geography type support in Parquet]: https://github.com/apache/arrow-rs/issues/8373 |
| 202 | + |
| 203 | +## Thanks to Our Contributors |
| 204 | +```console |
| 205 | +$ git shortlog -sn 56.0.0..57.0.0 |
| 206 | + 36 Matthijs Brobbel |
| 207 | + 20 Andrew Lamb |
| 208 | + 13 Ryan Johnson |
| 209 | + 11 Ed Seidl |
| 210 | + 10 Connor Sanders |
| 211 | + 8 Alex Huang |
| 212 | + 5 Emil Ernerfeldt |
| 213 | + 5 Liam Bao |
| 214 | + 5 Matthew Kim |
| 215 | + 4 nathaniel-d-ef |
| 216 | + 3 Raz Luvaton |
| 217 | + 3 albertlockett |
| 218 | + 3 dependabot[bot] |
| 219 | + 3 mwish |
| 220 | + 2 Ben Ye |
| 221 | + 2 Congxian Qiu |
| 222 | + 2 Dewey Dunnington |
| 223 | + 2 Kyle Barron |
| 224 | + 2 Lilian Maurel |
| 225 | + 2 Mark Nash |
| 226 | + 2 Nuno Faria |
| 227 | + 2 Pepijn Van Eeckhoudt |
| 228 | + 2 Tobias Schwarzinger |
| 229 | + 2 lichuang |
| 230 | + 1 Adam Gutglick |
| 231 | + 1 Adam Reeve |
| 232 | + 1 Alex Stephen |
| 233 | + 1 Chen Chongchen |
| 234 | + 1 Jack |
| 235 | + 1 Jeffrey Vo |
| 236 | + 1 Jörn Horstmann |
| 237 | + 1 Kaushik Srinivasan |
| 238 | + 1 Li Jiaying |
| 239 | + 1 Lin Yihai |
| 240 | + 1 Marco Neumann |
| 241 | + 1 Piotr Findeisen |
| 242 | + 1 Piotr Srebrny |
| 243 | + 1 Samuele Resca |
| 244 | + 1 Van De Bio |
| 245 | + 1 Yan Tingwang |
| 246 | + 1 ding-young |
| 247 | + 1 kosiew |
| 248 | + 1 张林伟 |
| 249 | +``` |
0 commit comments