3
3
* Spark is a multi-language engine for executing data engineering,
4
4
data science, and machine learning on single-node machines or clusters.
5
5
6
- * PySpark is the Python API for Spark.
6
+ * PySpark is the Python API for Spark.
7
+
8
+ Note:
9
+
10
+ * Assume that ` % ` is an operating system command prompt
7
11
8
12
# Start PySpark
9
13
@@ -18,24 +22,23 @@ are going to run PySpark shell in your laptop/macbook,
18
22
then you do not need to start any clauter -- your
19
23
laptop/macbook as a cluster of a single node:
20
24
21
- export SPARK_HOME=<installed-directory-for-spark>
22
- cd $SPARK_HOME
23
- ./sbin/start-all.sh
25
+ % export SPARK_HOME=<installed-directory-for-spark>
26
+ % $SPARK_HOME/sbin/start-all.sh
24
27
25
28
26
29
# Invoke PySpark Shell
27
30
28
31
To start PySpark, execute the following:
29
32
30
33
31
- cd $ SPARK_HOME
32
- . /bin/pyspark
34
+ % export SPARK_HOME=<installed-directory-for-spark>
35
+ % $SPARK_HOME /bin/pyspark
33
36
34
37
35
38
Successful execution will give you the PySpark prompt:
36
39
37
40
38
- ~ % ./spark-3.3.0 /bin/pyspark
41
+ % $SPARK_HOME /bin/pyspark
39
42
Python 3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10)
40
43
Welcome to
41
44
____ __
@@ -52,8 +55,9 @@ Successful execution will give you the PySpark prompt:
52
55
53
56
54
57
Note that the shell already have created two objects:
55
- * SparkContext (` sc ` ) object and you may use it to create RDDs.
56
- * SparkSession (` spark ` ) object and you may use it to create DataFrames.
58
+ * ` sc ` : as parkContext object and you may use it to create RDDs.
59
+ * ` spark ` : an SparkSession object and you may use it to create DataFrames.
60
+
57
61
58
62
# Creating RDDs
59
63
0 commit comments