Batch Data Import

This section covers batch importing data into Apache Spark, such as seen in the non-streaming examples from Chapter 1. Those examples load data from files all at once into one RDD, processes that RDD, the job completes, and the program exits. In a production system, you could set up a cron job to kick off a batch job each night to process the last day's worth of log files and then publish statistics for the last day.

Importing From Files covers caveats when importing data from files.
Importing from Databases links to examples of reading data from databases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Data Import

FilesExpand file tree

batch.md

Latest commit

History

batch.md

File metadata and controls

Batch Data Import