[Java]Best practice with Apache/Spark #2015

jayhan94 · 2025-01-20T10:36:59Z

Feature Request

Is there any best practice with apache/spark? Will the community implement such a module?

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

chaokunyang · 2025-01-30T06:26:15Z

Hi @jayhan94 , we don't have such documents currently. A better fury integration with spark/flink would need to change the source code of serialization module in spark/flink, which is beyond the scope of this project. Maybe in future we can submit several proposal to spark/flink communities.

Currently, if you want to use fury in spark/flink, you can update your driver program to add several chained(narrow dependency in spark) serialization/deserialization operators.

Here is a simple spark rdd example:

val lines = sc.textFile("data.txt")
val structSet = lines.map(s => Json.parse(s, Struct.class))
kvset = structSet.map(s => (s.key, fury.serialize(s)))
kvset.groupByKey().map(t => (t._1, fury.deserialize(t._2.first))).collect.foreach(println)

Flink program will be similiar:

DataStream<Struct> dataStream = xxxstream.map(s -> Json.parse(s, Struct.class));
DataStream<byte[]> byteStream = dataStream.map(s -> json.serialize(s));
byteStream.rebalance().map(bytes -> (Struct)fury.deserialize(bytes));

jayhan94 · 2025-02-02T04:57:02Z

@chaokunyang Thanks for your reply. I don't learn about the serializer of rdd. I meant to implement spark.serializer based on fury which may be helpful to the shuffle process just like KryoSerializer.

chaokunyang mentioned this issue Jan 30, 2025

[Question] Is there any guide to how to use Fury on Flink? #2006

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java]Best practice with Apache/Spark #2015

[Java]Best practice with Apache/Spark #2015

jayhan94 commented Jan 20, 2025

chaokunyang commented Jan 30, 2025

jayhan94 commented Feb 2, 2025 •

edited

Loading

[Java]Best practice with Apache/Spark #2015

[Java]Best practice with Apache/Spark #2015

Comments

jayhan94 commented Jan 20, 2025

Feature Request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

chaokunyang commented Jan 30, 2025

jayhan94 commented Feb 2, 2025 • edited Loading

jayhan94 commented Feb 2, 2025 •

edited

Loading