Skip to content

Commit 06f69a6

Browse files
updated docs
1 parent 5110cb1 commit 06f69a6

File tree

4 files changed

+51
-2
lines changed

4 files changed

+51
-2
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## [O'Reilly book: <span style="color:red">Data Algorithms with Spark</span>](https://www.oreilly.com/library/view/data-algorithms-with/9781492082378/)
22

3-
## [Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)](./images/data-alg-foreword2.pdf)
3+
## [Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)](./images/FOREWORD_by_Dr_Matei_Zaharia.md)
44

55
----
66
#### Author: [Mahmoud Parsian](https://www.linkedin.com/in/mahmoudparsian/)

code/bonus_chapters/dataframes/explode_arrays_into_rows/python/explode_arrays_into_rows.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# Pyspark – Split multiple array columns into rows
66
#-------------------------------------------------
77

8-
# creating a sparksession object
8+
# creating a SparkSession object
99
spark=SparkSession.builder.getOrCreate()
1010

1111
# now creating dataframe
+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
## FOREWORD by Dr. Matei Zaharia
2+
3+
### [Book: Data Algorithms with Spark by Mahmoud Parsian](https://www.amazon.com/Data-Algorithms-Spark-Recipes-Patterns/dp/1492082384/ref=asc_df_1492082384/)
4+
5+
<div style="text-align: justify">
6+
When I started the Apache Spark project a decade ago,
7+
one of my main goals was to make it easier for a wide
8+
range of users to implement parallel algorithms. New
9+
algorithms acting on large-scale data are having a
10+
profound impact in all areas of computing, and I wanted
11+
to help developers implement new algorithms and reason
12+
about their performance without having to build a
13+
distributed system from scratch.
14+
</div><br>
15+
16+
<div style="text-align: justify">
17+
I am therefore very excited to see this new book by
18+
Dr. Mahmoud Parsian on data algorithms with Spark.
19+
Dr. Parsian has extensive research and practical
20+
experience with large-scale data-parallel algorithms,
21+
including developing new algorithms for bioinformatics
22+
as the lead of Illumina’s big data team. In this book,
23+
he introduces Spark through its Python API, PySpark,
24+
and shows how to implement a wide range of useful algorithms
25+
efficiently using Spark’s distributed computing primitives.
26+
He also explains the workings of the underlying Spark engine
27+
and how to optimize your algorithms through techniques such
28+
as controlling data partitioning. This book will be a great
29+
resource for both readers looking to implement existing
30+
algorithms in a scalable fashion and readers who are developing
31+
new, custom algorithms using Spark.
32+
</div><br>
33+
34+
<div style="text-align: justify">
35+
I am also thrilled that Dr. Parsian has included working
36+
code examples for all the algorithms he discusses, using
37+
real-world problems where possible. These will serve as a
38+
great starting point for readers who want to implement
39+
similar computations. Whether you intend to use these
40+
algorithms directly or build your own, custom algorithms
41+
using Spark, I hope that you enjoy this book as an introduction
42+
to the open-source engine, its inner workings, and the modern
43+
parallel algorithms that are having such a broad impact across computing.
44+
</div>
45+
46+
Matei Zaharia <br>
47+
Assistant Professor of Computer Science, Stanford <br>
48+
Chief Technologist, Databricks <br>
49+
Original Creator of Apache Spark

images/~$ta-alg-foreword2.docx

162 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)