File tree 3 files changed +21
-3
lines changed
3 files changed +21
-3
lines changed Original file line number Diff line number Diff line change @@ -31,7 +31,7 @@ Original Creator of Apache Spark <br>
31
31
32
32
### Author: [ Mahmoud Parsian] ( https://www.linkedin.com/in/mahmoudparsian/ )
33
33
34
- ### Goal of this book: enable writing efficient & simpler PySpark code for data algorithms using Spark
34
+ ### [ Goal of this book: Data Algorithms with Spark] ( ./docs/goal_of_book.md )
35
35
36
36
### [ Story of this book: Data Algorithms with Spark] ( ./docs/story_of_book.md )
37
37
Original file line number Diff line number Diff line change 3
3
# -----------------------------------------------------
4
4
# @author Mahmoud Parsian
5
5
# -----------------------------------------------------
6
- export SPARK_HOME=" /book/spark-3.2 .0"
6
+ export SPARK_HOME=" /book/spark-3.4 .0"
7
7
export INPUT_PATH=" /book/code/chap10/sample_numbers.txt"
8
- export SPARK_PROG=" /book/code/chap10/minmax_use_mappartitions .py"
8
+ export SPARK_PROG=" /book/code/chap10/minmax_use_mappartitions_v2 .py"
9
9
#
10
10
# run the PySpark program:
11
11
$SPARK_HOME /bin/spark-submit $SPARK_PROG $INPUT_PATH
Original file line number Diff line number Diff line change
1
+ # Goal of this book: Data Algorithms with Spark
2
+
3
+ 1 . Keep it SIMPLE!
4
+
5
+ 2 . Goal of this book: enable writing efficient &
6
+ simpler PySpark code for data algorithms using Spark
7
+
8
+ 3 . A lot of [ working PySpark code] ( ../code/ ) is provided
9
+ so that the reader can understand how to use basic
10
+ transformations on using RDDs and DataFrames
11
+
12
+ 4 . As much as possible, I have avoided writing complex
13
+ code and functions: keep it simple so that you can
14
+ debug easily and your co-workers can understand them.
15
+
16
+ 5 . CUT-and-PASTE: you may take portions of the [ code] ( ../code/ )
17
+ and tailor it to your needs
18
+
You can’t perform that action at this time.
0 commit comments