Skip to content

Commit a6081a3

Browse files
updated pyspark tutorial
1 parent c197b89 commit a6081a3

File tree

2 files changed

+26
-18
lines changed

2 files changed

+26
-18
lines changed

Diff for: code/bonus_chapters/pyspark_tutorial/README.md

+13-9
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,11 @@
33
* Spark is a multi-language engine for executing data engineering,
44
data science, and machine learning on single-node machines or clusters.
55

6-
* PySpark is the Python API for Spark.
6+
* PySpark is the Python API for Spark.
7+
8+
Note:
9+
10+
* Assume that `%` is an operating system command prompt
711

812
# Start PySpark
913

@@ -18,24 +22,23 @@ are going to run PySpark shell in your laptop/macbook,
1822
then you do not need to start any clauter -- your
1923
laptop/macbook as a cluster of a single node:
2024

21-
export SPARK_HOME=<installed-directory-for-spark>
22-
cd $SPARK_HOME
23-
./sbin/start-all.sh
25+
% export SPARK_HOME=<installed-directory-for-spark>
26+
% $SPARK_HOME/sbin/start-all.sh
2427

2528

2629
# Invoke PySpark Shell
2730

2831
To start PySpark, execute the following:
2932

3033

31-
cd $SPARK_HOME
32-
./bin/pyspark
34+
% export SPARK_HOME=<installed-directory-for-spark>
35+
% $SPARK_HOME/bin/pyspark
3336

3437

3538
Successful execution will give you the PySpark prompt:
3639

3740

38-
~ % ./spark-3.3.0/bin/pyspark
41+
% $SPARK_HOME/bin/pyspark
3942
Python 3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10)
4043
Welcome to
4144
____ __
@@ -52,8 +55,9 @@ Successful execution will give you the PySpark prompt:
5255

5356

5457
Note that the shell already have created two objects:
55-
* SparkContext (`sc`) object and you may use it to create RDDs.
56-
* SparkSession (`spark`) object and you may use it to create DataFrames.
58+
* `sc` : as parkContext object and you may use it to create RDDs.
59+
* `spark` : an SparkSession object and you may use it to create DataFrames.
60+
5761

5862
# Creating RDDs
5963

Diff for: code/bonus_chapters/pyspark_tutorial/pyspark_tutorial.md

+13-9
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,11 @@
33
* Spark is a multi-language engine for executing data engineering,
44
data science, and machine learning on single-node machines or clusters.
55

6-
* PySpark is the Python API for Spark.
6+
* PySpark is the Python API for Spark.
7+
8+
Note:
9+
10+
* Assume that `%` is an operating system command prompt
711

812
# Start PySpark
913

@@ -18,24 +22,23 @@ are going to run PySpark shell in your laptop/macbook,
1822
then you do not need to start any clauter -- your
1923
laptop/macbook as a cluster of a single node:
2024

21-
export SPARK_HOME=<installed-directory-for-spark>
22-
cd $SPARK_HOME
23-
./sbin/start-all.sh
25+
% export SPARK_HOME=<installed-directory-for-spark>
26+
% $SPARK_HOME/sbin/start-all.sh
2427

2528

2629
# Invoke PySpark Shell
2730

2831
To start PySpark, execute the following:
2932

3033

31-
cd $SPARK_HOME
32-
./bin/pyspark
34+
% export SPARK_HOME=<installed-directory-for-spark>
35+
% $SPARK_HOME/bin/pyspark
3336

3437

3538
Successful execution will give you the PySpark prompt:
3639

3740

38-
~ % ./spark-3.3.0/bin/pyspark
41+
% $SPARK_HOME/bin/pyspark
3942
Python 3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10)
4043
Welcome to
4144
____ __
@@ -52,8 +55,9 @@ Successful execution will give you the PySpark prompt:
5255

5356

5457
Note that the shell already have created two objects:
55-
* SparkContext (`sc`) object and you may use it to create RDDs.
56-
* SparkSession (`spark`) object and you may use it to create DataFrames.
58+
* `sc` : as parkContext object and you may use it to create RDDs.
59+
* `spark` : an SparkSession object and you may use it to create DataFrames.
60+
5761

5862
# Creating RDDs
5963

0 commit comments

Comments
 (0)