Skip to content

Commit b4cd319

Browse files
alambCopilot
andauthored
Website: Add blog post for arrow-rs 57.0.0 (#720)
- Closes apache/arrow-rs#8463 Preview URL: https://alamb.github.io/arrow-site/blog/2025/09/04/arrow-rs-57.0.0/ This release has a crazy amount of content so we should tell the world about it. Here are two related blogs: - #712 - #711 --------- Co-authored-by: Copilot <[email protected]>
1 parent fd5903e commit b4cd319

File tree

1 file changed

+249
-0
lines changed

1 file changed

+249
-0
lines changed
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
---
2+
layout: post
3+
title: "Apache Arrow Rust 57.0.0 Release"
4+
date: "2025-10-30 00:00:00"
5+
author: pmc
6+
categories: [release]
7+
---
8+
<!--
9+
{% comment %}
10+
Licensed to the Apache Software Foundation (ASF) under one or more
11+
contributor license agreements. See the NOTICE file distributed with
12+
this work for additional information regarding copyright ownership.
13+
The ASF licenses this file to you under the Apache License, Version 2.0
14+
(the "License"); you may not use this file except in compliance with
15+
the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing, software
20+
distributed under the License is distributed on an "AS IS" BASIS,
21+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
22+
See the License for the specific language governing permissions and
23+
limitations under the License.
24+
{% endcomment %}
25+
-->
26+
27+
The Apache Arrow team is pleased to announce that the v57.0.0 release of Apache Arrow
28+
Rust is now available on crates.io ([arrow] and [parquet]) and as [source download].
29+
30+
[arrow]: https://crates.io/crates/arrow
31+
[parquet]: https://crates.io/crates/parquet
32+
[source download]: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-57.0.0
33+
34+
See the [57.0.0 changelog] for a full list of changes.
35+
36+
[57.0.0 changelog]: https://github.com/apache/arrow-rs/blob/57.0.0/CHANGELOG.md
37+
38+
39+
## New Features
40+
41+
Note: Arrow Rust hosts the development of the [parquet] crate, a high
42+
performance Rust implementation of [Apache Parquet].
43+
44+
### Performance: 4x Faster Parquet Metadata Parsing 🚀
45+
46+
Ed Seidl ([@etseidl]) and Jörn Horstmann ([@jhorstmann]) contributed a rewritten
47+
thrift metadata parser for Parquet files which is almost 4x faster than the
48+
previous parser based on the `thrift` crate. This is especially exciting for low
49+
latency use cases and reading Parquet files with large amounts of metadata (e.g.
50+
many row groups or columns).
51+
See the [blog post about the new Parquet metadata parser] for more details.
52+
53+
<div style="display: flex; gap: 16px; justify-content: center; align-items: flex-start;">
54+
<img src="{{ site.baseurl }}/img/rust-parquet-metadata/results.png" width="100%" class="img-responsive" alt="" aria-hidden="true">
55+
</div>
56+
57+
*Figure 1:* Performance improvements of [Apache Parquet] metadata parsing between version `56.2.0` and `57.0.0`.
58+
59+
60+
[Apache Parquet]: https://parquet.apache.org/
61+
[@etseidl]: https://github.com/etseidl
62+
[@jhorstmann]: https://github.com/jhorstmann
63+
64+
[blog post about the new Parquet metadata parser]: https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/
65+
66+
### New `arrow-avro` Crate
67+
68+
The `57.0.0` release introduces a new [`arrow-avro`] crate contributed by [@jecsand838]
69+
and [@nathaniel-d-ef] that provides much more efficient conversion between
70+
[Apache Avro](https://avro.apache.org/) and Arrow `RecordBatch`es, as well as broader feature support.
71+
72+
Previously, Arrow‑based systems that read or wrote Avro data
73+
typically used the general‑purpose [apache-avro] crate. While mature and
74+
feature‑complete, its row-oriented API does not support features such as
75+
projection pushdown or vectorized execution. The new `arrow-avro` crate supports
76+
these features efficiently by converting Avro data directly into Arrow's
77+
columnar format.
78+
79+
See the [blog post about adding arrow-avro] for more details.
80+
81+
<div style="display: flex; gap: 16px; justify-content: center; align-items: flex-start; padding: 20px 15px;">
82+
<img src="{{ site.baseurl }}/img/introducing-arrow-avro/arrow-avro-architecture.svg"
83+
width="100%"
84+
alt="High-level `arrow-avro` architecture"
85+
style="background:#fff">
86+
</div>
87+
88+
*Figure 2:* Architecture of the `arrow-avro` crate.
89+
90+
91+
[@jecsand838]: https://github.com/jecsand838
92+
[@nathaniel-d-ef]: https://github.com/nathaniel-d-ef
93+
[apache-avro]: https://crates.io/crates/apache-avro
94+
[`arrow-avro`]: https://crates.io/crates/arrow-avro
95+
96+
[blog post about adding arrow-avro]: https://arrow.apache.org/blog/2025/10/23/introducing-arrow-avro/
97+
98+
99+
### Parquet Variant Support 🧬
100+
101+
The Apache Parquet project recently added a [new `Variant` type] for
102+
representing semi-structured data. The `57.0.0` release includes support for reading and
103+
writing both normal and shredded `Variant` values to and from Parquet files. It
104+
also includes [parquet-variant], a complete library for working with `Variant`
105+
values, [`VariantArray`] for working with arrays of `Variant` values in Apache
106+
Arrow, computation kernels for converting to/from JSON and Arrow types,
107+
extracting paths, and shredding values.
108+
109+
[new `Variant` type]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
110+
[`VariantArray`]: https://docs.rs/parquet/latest/parquet/variant/struct.VariantArray.html
111+
[parquet-variant]: https://crates.io/crates/parquet-variant
112+
113+
```rust
114+
// Use the VariantArrayBuilder to build a VariantArray
115+
let mut builder = VariantArrayBuilder::new(3);
116+
builder.new_object().with_field("name", "Alice").finish(); // row 1: {"name": "Alice"}
117+
builder.append_value("such wow"); // row 2: "such wow" (a string)
118+
let array = builder.build();
119+
120+
// Since VariantArray is an ExtensionType, it needs to be converted
121+
// to an ArrayRef and Field with the appropriate metadata
122+
// before it can be written to a Parquet file
123+
let field = array.field("data");
124+
let array = ArrayRef::from(array);
125+
// create a RecordBatch with the VariantArray
126+
let schema = Schema::new(vec![field]);
127+
let batch = RecordBatch::try_new(Arc::new(schema), vec![array])?;
128+
129+
// Now you can write the RecordBatch to the Parquet file, as normal
130+
let file = std::fs::File::create("variant.parquet")?;
131+
let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
132+
writer.write(&batch)?;
133+
writer.close()?;
134+
```
135+
136+
137+
This support is being integrated into query engines, such as
138+
[@friendlymatthew]'s [`datafusion-variant`] crate to integrate into DataFusion
139+
and [delta-rs]. While this support is still experimental, we believe the APIs
140+
are mostly complete and do not expect major changes. Please consider trying
141+
it out and providing feedback and improvements.
142+
143+
[`datafusion-variant`]: https://github.com/datafusion-contrib/datafusion-variant
144+
[delta-rs]: https://github.com/delta-io/delta-rs/issues/3637
145+
146+
Thanks to the many contributors who made this possible, including:
147+
* Ryan Johnson ([@scovich]), Congxian Qiu ([@klion26]), and Liam Bao ([@liamzwbao]) for completing the implementation
148+
* Li Jiaying ([@PinkCrow007]), Aditya Bhatnagar ([@carpecodeum]), and Malthe Karbo ([@mkarbo]) for
149+
initiating the work
150+
* Everyone else who has contributed, including [@superserious-dev], [@friendlymatthew], [@micoo227], [@Weijun-H],
151+
[@harshmotw-db], [@odysa], [@viirya], [@adriangb], [@kosiew], [@codephage2020],
152+
[@ding-young], [@mbrobbel], [@petern48], [@sdf-jkl], [@abacef], and [@mprammer].
153+
154+
[@PinkCrow007]: https://github.com/PinkCrow007
155+
[@mkarbo]: https://github.com/mkarbo
156+
[@carpecodeum]: https://github.com/carpecodeum
157+
[@scovich]: https://github.com/scovich
158+
[@superserious-dev]: https://github.com/superserious-dev
159+
[@friendlymatthew]: https://github.com/friendlymatthew
160+
[@micoo227]: https://github.com/micoo227
161+
[@Weijun-H]: https://github.com/Weijun-H
162+
[@harshmotw-db]: https://github.com/harshmotw-db
163+
[@odysa]: https://github.com/odysa
164+
[@viirya]: https://github.com/viirya
165+
[@klion26]: https://github.com/klion26
166+
[@adriangb]: https://github.com/adriangb
167+
[@kosiew]: https://github.com/kosiew
168+
[@liamzwbao]: https://github.com/liamzwbao
169+
[@codephage2020]: https://github.com/codephage2020
170+
[@ding-young]: https://github.com/ding-young
171+
[@mbrobbel]: https://github.com/mbrobbel
172+
[@petern48]: https://github.com/petern48
173+
[@sdf-jkl]: https://github.com/sdf-jkl
174+
[@abacef]: https://github.com/abacef
175+
[@mprammer]: https://github.com/mprammer
176+
177+
See the ticket [Variant type support in Parquet #6736] for more details
178+
179+
180+
[Variant type support in Parquet #6736]: https://github.com/apache/arrow-rs/issues/6736
181+
182+
183+
### Parquet Geometry Support 🗺️
184+
185+
186+
The `57.0.0` release also includes support for reading and writing [Parquet Geometry
187+
types], `GEOMETRY` and `GEOGRAPHY`, including `GeospatialStatistics`
188+
contributed by Kyle Barron ([@kylebarron]), Dewey Dunnington ([@paleolimbot]),
189+
Kaushik Srinivasan ([@kaushiksrini]), and Blake Orth ([@BlakeOrth]).
190+
191+
Please see the [Implement Geometry and Geography type support in Parquet] tracking ticket for more details.
192+
193+
[@kylebarron]: https://github.com/kylebarron
194+
[@paleolimbot]: https://github.com/paleolimbot
195+
[@kaushiksrini]: https://github.com/kaushiksrini
196+
[@BlakeOrth]: https://github.com/BlakeOrth
197+
198+
[Parquet Geometry types]: https://github.com/apache/parquet-format/blob/master/Geospatial.md
199+
200+
201+
[Implement Geometry and Geography type support in Parquet]: https://github.com/apache/arrow-rs/issues/8373
202+
203+
## Thanks to Our Contributors
204+
```console
205+
$ git shortlog -sn 56.0.0..57.0.0
206+
36 Matthijs Brobbel
207+
20 Andrew Lamb
208+
13 Ryan Johnson
209+
11 Ed Seidl
210+
10 Connor Sanders
211+
8 Alex Huang
212+
5 Emil Ernerfeldt
213+
5 Liam Bao
214+
5 Matthew Kim
215+
4 nathaniel-d-ef
216+
3 Raz Luvaton
217+
3 albertlockett
218+
3 dependabot[bot]
219+
3 mwish
220+
2 Ben Ye
221+
2 Congxian Qiu
222+
2 Dewey Dunnington
223+
2 Kyle Barron
224+
2 Lilian Maurel
225+
2 Mark Nash
226+
2 Nuno Faria
227+
2 Pepijn Van Eeckhoudt
228+
2 Tobias Schwarzinger
229+
2 lichuang
230+
1 Adam Gutglick
231+
1 Adam Reeve
232+
1 Alex Stephen
233+
1 Chen Chongchen
234+
1 Jack
235+
1 Jeffrey Vo
236+
1 Jörn Horstmann
237+
1 Kaushik Srinivasan
238+
1 Li Jiaying
239+
1 Lin Yihai
240+
1 Marco Neumann
241+
1 Piotr Findeisen
242+
1 Piotr Srebrny
243+
1 Samuele Resca
244+
1 Van De Bio
245+
1 Yan Tingwang
246+
1 ding-young
247+
1 kosiew
248+
1 张林伟
249+
```

0 commit comments

Comments
 (0)