Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 652 Bytes

File metadata and controls

12 lines (8 loc) · 652 Bytes

Batch Data Import

This section covers batch importing data into Apache Spark, such as seen in the non-streaming examples from Chapter 1. Those examples load data from files all at once into one RDD, processes that RDD, the job completes, and the program exits. In a production system, you could set up a cron job to kick off a batch job each night to process the last day's worth of log files and then publish statistics for the last day.