Skip to content

pravega/spark-connectors

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

55aa4a5 · May 12, 2022

History

80 Commits
Jul 15, 2021
Apr 16, 2021
Oct 21, 2021
Aug 9, 2021
Apr 17, 2019
Nov 12, 2020
Apr 17, 2019
Apr 17, 2019
Feb 26, 2021
Sep 18, 2021
Jul 15, 2021
May 12, 2022
Jan 15, 2021
Jan 15, 2021
Aug 13, 2020

Repository files navigation

Pravega Spark Connectors Build Status License

This repository implements connectors to read and write Pravega Streams with Apache Spark, a high-performance analytics engine for batch and streaming data.

Build end-to-end stream processing and batch pipelines that use Pravega as the stream storage and message bus, and Apache Spark for computation over the streams.

Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.

To learn more about Pravega, visit https://pravega.io

Features & Highlights

  • Exactly-once processing guarantees for both Reader and Writer, supporting end-to-end exactly-once processing pipelines
  • A Spark micro-batch reader connector allows Spark streaming applications to read Pravega Streams. Pravega stream cuts (i.e. offsets) are used to reliably recover from failures and provide exactly-once semantics.
  • A Spark batch reader connector allows Spark batch applications to read Pravega Streams.
  • A Spark writer allows Spark batch and streaming applications to write to Pravega Streams. Writes are optionally contained within Pravega transactions, providing exactly-once semantics.
  • Seamless integration with Spark's checkpoints.
  • Parallel Readers and Writers supporting high throughput and low latency processing.

Documentation

Compatibility Matrix

The master branch will always have the most recent supported versions of Spark and Pravega.

Spark Version Pravega Version Java Version To Build Connector Java Version To Run Connector Git Branch
3.1 0.10 Java 11 Java 8 or 11 master
3.0 0.10 Java 11 Java 8 or 11 r0.10-spark3.0
2.4 0.10 Java 8 Java 8 r0.10-spark2.4
3.0 0.9 Java 11 Java 8 or 11 r0.9
2.4 0.9 Java 8 Java 8 r0.9-spark2.4
3.0 0.8 Java 8 Java 8 r0.8-spark3.0
2.4 0.8 Java 8 Java 8 r0.8-spark2.4

Support

Don’t hesitate to ask! Contact the developers and community on Slack (signup) if you need any help. Open an issue if you found a bug on Github Issues.

About

Spark Connectors for Pravega is 100% open source and community-driven. All components are available under Apache 2 License on GitHub.