updated docs

mahmoudparsian · mahmoudparsian · commit 06f69a624554 · 2022-06-01T17:52:29.000-07:00
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 ## [O'Reilly book: <span style="color:red">Data Algorithms with Spark</span>](https://www.oreilly.com/library/view/data-algorithms-with/9781492082378/)
 
-## [Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)](./images/data-alg-foreword2.pdf)
+## [Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)](./images/FOREWORD_by_Dr_Matei_Zaharia.md)
 
 ----
 #### Author: [Mahmoud Parsian](https://www.linkedin.com/in/mahmoudparsian/) 
diff --git a/code/bonus_chapters/dataframes/explode_arrays_into_rows/python/explode_arrays_into_rows.py b/code/bonus_chapters/dataframes/explode_arrays_into_rows/python/explode_arrays_into_rows.py
@@ -5,7 +5,7 @@
 # Pyspark – Split multiple array columns into rows
 #-------------------------------------------------
   
-# creating a sparksession object
+# creating a SparkSession object
 spark=SparkSession.builder.getOrCreate()
   
 # now creating dataframe
diff --git a/images/FOREWORD_by_Dr_Matei_Zaharia.md b/images/FOREWORD_by_Dr_Matei_Zaharia.md
@@ -0,0 +1,49 @@
+## FOREWORD by Dr. Matei Zaharia
+
+### [Book: Data Algorithms with Spark by Mahmoud Parsian](https://www.amazon.com/Data-Algorithms-Spark-Recipes-Patterns/dp/1492082384/ref=asc_df_1492082384/)
+
+<div style="text-align: justify"> 
+When I started the Apache Spark project a decade ago, 
+one of my main goals was to make it easier for a wide 
+range of users to implement parallel algorithms. New 
+algorithms acting on large-scale data are having a 
+profound impact in all areas of computing, and I wanted 
+to help developers implement new algorithms and reason 
+about their performance without having to build a 
+distributed system from scratch.
+</div><br>
+
+<div style="text-align: justify"> 
+I am therefore very excited to see this new book by 
+Dr. Mahmoud Parsian on data algorithms with Spark. 
+Dr. Parsian has extensive research and practical 
+experience with large-scale data-parallel algorithms,
+including developing new algorithms for bioinformatics 
+as the lead of Illumina’s big data team. In this book, 
+he introduces Spark through its Python API, PySpark, 
+and shows how to implement a wide range of useful algorithms
+efficiently using Spark’s distributed computing primitives. 
+He also explains the workings of the underlying Spark engine
+and how to optimize your algorithms through techniques such
+as controlling data partitioning. This book will be a great
+resource for both readers looking to implement existing 
+algorithms in a scalable fashion and readers who are developing 
+new, custom algorithms using Spark.
+</div><br>
+
+<div style="text-align: justify"> 
+I am also thrilled that Dr. Parsian has included working 
+code examples for all the algorithms he discusses, using 
+real-world problems where possible. These will serve as a 
+great starting point for readers who want to implement 
+similar computations. Whether you intend to use these 
+algorithms directly or build your own, custom algorithms 
+using Spark, I hope that you enjoy this book as an introduction 
+to the open-source engine, its inner workings, and the modern 
+parallel algorithms that are having such a broad impact across computing.
+</div>
+
+Matei Zaharia <br>
+Assistant Professor of Computer Science, Stanford <br>
+Chief Technologist, Databricks <br>
+Original Creator of Apache Spark
diff --git a/images/~$ta-alg-foreword2.docx b/images/~$ta-alg-foreword2.docx